Skip to content

Conversation

@jloux-brapi
Copy link
Collaborator

@jloux-brapi jloux-brapi commented Mar 17, 2025

Description

BI-2578

  • Updated the BrAPIGermplasmDAO and the BrAPIDAOUtil to fetch all existing germplasm records at once for a program, without pagination. This will allow the BrAPI server to handle the request and it will resolve BI-2489 for Germplasm cache fetches. This feature is configurable via the new CACHE_PAGINATE_GERMPLASM env var, added to the template and application.yml. If the memory gets exhausted for a particular programs germplasm records, this var should be = true.
  • Updated the BrAPIDAOUtil so that for all other entities, it fetches the cache 65000 records at a time to potentially avoid any other SQL errors that could come up as a result of BI-2489. If this ends up being slow for these entities, we can potentially increase this amount, but a better solution IMO would be to get rid of the cache entirely for all entities and hit the BrAPI test server correctly with smaller pages so users only hit the test server when they need it (and hit it much less harder), with less data being transmitted. The max number of records allowable in a page fetch for the program cache is now configurable via the env var added CACHE_BRAPI_FETCH_PAGE_SIZE. Whatever the value of this is should match the paging.page-size.max-allowed application.property for the test server. If it's over the test server value, all requests will result in a 400.

Dependencies

This code is tied to this MR on the BrAPI Prod server. Once that code is merged, this code can be merged and the feature can be tested end to end.

I've also added new configurable variables, and created an MR for the docker stack with those same variables.

Testing

With a substantial database of germplasm records (more than 65k), start the application to load the cache. The cache should be able to load without fail thanks to the solving of BI-2489, and all germplasm data will be retrieved at once per program. There is a limit to this amount per program of 250k records. If clients get around that number, it will be time to move to a cacheless, request-based implementation of the germplasm fetch in the cache. (At that point, would recommend doing a cacheless impl for all entities).

There is other testing to be done with BI-2578, but that is more on the prod server side than bi-api so will leave it to you guys to look at that and test there.

Copy link
Contributor

@mlm483 mlm483 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally with the BJTS changes and it works well, I was able to fetch 68k germplasm in 10 seconds after flushing the cache.

@nickpalladino nickpalladino merged commit 5159c9d into develop Apr 16, 2025
1 check passed
@nickpalladino nickpalladino deleted the feature/BI-2578 branch April 16, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants