[BI-2578][BI-2489] - Optimize BrAPI Germplasm Search #447
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
BI-2578
BrAPIGermplasmDAOand theBrAPIDAOUtilto fetch all existing germplasm records at once for a program, without pagination. This will allow the BrAPI server to handle the request and it will resolve BI-2489 for Germplasm cache fetches. This feature is configurable via the newCACHE_PAGINATE_GERMPLASMenv var, added to the template and application.yml. If the memory gets exhausted for a particular programs germplasm records, this var should be =true.CACHE_BRAPI_FETCH_PAGE_SIZE. Whatever the value of this is should match thepaging.page-size.max-allowedapplication.property for the test server. If it's over the test server value, all requests will result in a 400.Dependencies
This code is tied to this MR on the BrAPI Prod server. Once that code is merged, this code can be merged and the feature can be tested end to end.
I've also added new configurable variables, and created an MR for the docker stack with those same variables.
Testing
With a substantial database of germplasm records (more than 65k), start the application to load the cache. The cache should be able to load without fail thanks to the solving of BI-2489, and all germplasm data will be retrieved at once per program. There is a limit to this amount per program of 250k records. If clients get around that number, it will be time to move to a cacheless, request-based implementation of the germplasm fetch in the cache. (At that point, would recommend doing a cacheless impl for all entities).
There is other testing to be done with BI-2578, but that is more on the prod server side than bi-api so will leave it to you guys to look at that and test there.