new interface for samples and fba, fva solutions and workaround for #96#98
new interface for samples and fba, fva solutions and workaround for #96#98hariszaf wants to merge 9 commits intoGeomScale:developfrom
Conversation
vfisikop
left a comment
There was a problem hiding this comment.
Thanks! These are really cool stuff.
I have some comments mostly on the efficiency of the proposed changes.
| The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution. | ||
|
|
||
| ```python | ||
| fva_output_df = model.fva_to_df() |
There was a problem hiding this comment.
Cool. Do we still need the old method model.fva() that changes the internal state of model? I suppose that one function is enough. Then the new one could be called fva() it is more clear I think. And the old one should be made private. Actually it is even simpler if there exist only one function fva() with the new interface.
There was a problem hiding this comment.
It's a bit tricky as the original fva returns a set of 4 things that we use in other parts of the PolytopeSampler class. Yet, I changes those parts with an _fva and _fba and we now have fva() and fba() to return the dataframes. 😉
There was a problem hiding this comment.
OK, but as the name suggests _fva and _fba should be private, right?
Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).
| self._biomass_function, | ||
| self._parameters["opt_percentage"], | ||
| ) | ||
| self._min_fluxes = min_fluxes |
There was a problem hiding this comment.
Do we really need them to be stored in the model?
There was a problem hiding this comment.
My initial thought was about the sample_from_fva_output function that used to get as arguments the min_fluxes and the max_fluxes.
At first, I thought of keeping those in the model to provide them in the function but then I thought it would be simpler if the fva is performed with the sample_from_fva_output function.
Yet, maybe it's practical to keep them. No strong arguments on that.
There was a problem hiding this comment.
Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.
| For example: | ||
|
|
||
| ```python | ||
| initial_medium = model.medium |
There was a problem hiding this comment.
I think this can be simplified avoiding to copy the entire dictionary (see #95 (comment))
There was a problem hiding this comment.
check for my reply on the issue.
if you have any ideas for alternatives, let me know!
There was a problem hiding this comment.
OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?
|
|
||
| return steady_states | ||
|
|
||
| steady_states_df = pd.DataFrame(steady_states, index = self._metabolic_network.reactions) |
There was a problem hiding this comment.
Do we really want to make that transformation by default? As far as I can understand that could be memory intensive (especially for large matrices). I think it is better to add a method that takes steady_states and create a df or simply write that one line of code in the doc or README.
| return steady_states | ||
|
|
||
| @staticmethod | ||
| def samples_as_df(model, samples): |
dingo/loading_models.py
Outdated
| metabolites_map.columns = ["metabolite_name"] | ||
|
|
||
| return lb, ub, S, metabolites, reactions, \ | ||
| biomass_index, biomass_function, medium, inter_medium, exchanges, \ |
There was a problem hiding this comment.
Could this be in two lines instead of three?
dingo/loading_models.py
Outdated
|
|
||
| return lb, ub, S, metabolites, reactions, \ | ||
| biomass_index, biomass_function, medium, inter_medium, exchanges, \ | ||
| reactions_map, metabolites_map |
There was a problem hiding this comment.
I think you should also update the documentation of the function with medium, ..., metabolites_map
| model.set_slow_mode() | ||
|
|
||
| # Check if script is running in GitHub action | ||
| if os.getenv('CI', 'false').lower() == 'true': |
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "execution_count": 3, |
There was a problem hiding this comment.
Unfortunately, notebooks cannot be reviewed in github... See https://github.com/orgs/community/discussions/12959
| self._N_shift = [] | ||
| self._T = [] | ||
| self._T_shift = [] | ||
| self._metabolic_network = copy.deepcopy(metabol_net) |
There was a problem hiding this comment.
I think making a deep copy just shifting the problem (see #96 (comment))
Shouldn't the sampler only transform A and b and leave the model as is?
There was a problem hiding this comment.
I think that this has to do with the fast_remove_redundant_facets.
I suggest we keep that as is for this PR so we have something working fine for now, and we open a new PR for the fast_remove_redundant_facets.
There was a problem hiding this comment.
OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!
vfisikop
left a comment
There was a problem hiding this comment.
Hi again, sorry for late reply.
Let me comment to each change in this PR separately,
- Regarding dataframes, is it simpler just to let is to the user to create a dataframe if needed, it is just a call
pd.DataFrame(data, index)at the end of the day. This way we avoid complicating the interface. - Regarding #95 and medium I propose to handle it in a separate PR to solves in fully (it seems that now it is a work in progress)
- Regarding #96 I agree to use the deepcopy workaround (but please keep the issue open)
| The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution. | ||
|
|
||
| ```python | ||
| fva_output_df = model.fva_to_df() |
There was a problem hiding this comment.
OK, but as the name suggests _fva and _fba should be private, right?
Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).
| For example: | ||
|
|
||
| ```python | ||
| initial_medium = model.medium |
There was a problem hiding this comment.
OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?
| self._N_shift = [] | ||
| self._T = [] | ||
| self._T_shift = [] | ||
| self._metabolic_network = copy.deepcopy(metabol_net) |
There was a problem hiding this comment.
OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!
| self._biomass_function, | ||
| self._parameters["opt_percentage"], | ||
| ) | ||
| self._min_fluxes = min_fluxes |
There was a problem hiding this comment.
Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.
Aim of this PR is:
With respect to aim (1) instead of searching for the index of a reaction, the user gets a dataframe with the reaction ids as indices.
and
A relevant interface is available for fba and fva solutions too.
There is also a
reactions_mapto link a reaction's id to its complete name (quite useful in case of modelseed models).Regarding aim (2), an example of how to use the
mediumwas added in the README file and the setter was fixed (#95), while usingdeepcopyfor getting the bounds addressed #96