|
| 1 | +# Migrating Test Server Data to Production Server Data |
| 2 | +Apart from optimizations for specific use cases, there's a major difference between the |
| 3 | +BrAPI Java test server and the production server: |
| 4 | + |
| 5 | +**All id columns are replaced with the UUID data type instead of TEXT in the production database.** |
| 6 | + |
| 7 | +This change was done out of need for: |
| 8 | +1. Standardization of good data practices |
| 9 | +2. Performance Optimization |
| 10 | + |
| 11 | +The performance difference between TEXT and UUID columns in the database might not be felt in use cases |
| 12 | +where there are small amounts of data, but in large batch operations this optimization can speed queries up by about double time. |
| 13 | + |
| 14 | +This database schema change doesn't only affect the DB of BrAPI Production Server, as the codebase has also been modified to |
| 15 | +accommodate for this data standardization. |
| 16 | + |
| 17 | +This document will help you prepare for a data migration to using UUIDs the standard ID column type. |
| 18 | + |
| 19 | +## Do you really need to migrate test server data? |
| 20 | +Here at BrAPI we hope that you used the test server for non-production data for your application. Since the introduction of |
| 21 | +BrAPI Java Production Server we hope that the test server is only used for testing for your application before you go live. |
| 22 | + |
| 23 | +If that's the case, you should need to do nothing to proceed. Simply build the application on an empty DB and the schema |
| 24 | +should be generated with UUID as the ID column type. |
| 25 | + |
| 26 | +However, since the production server was only recently introduced, we realize this is likely not the case. |
| 27 | + |
| 28 | +If you have been utilizing the test server with production data, there are several steps you will need to take to swap over |
| 29 | +to the production data model. |
| 30 | + |
| 31 | +This document will cover these. |
| 32 | + |
| 33 | +## Step 1: Undo Dummy Migration Data |
| 34 | +In the BrAPI Java Test Server, there are some migration scripts kicked off the first time |
| 35 | +you build the app to put in some data you can look at and query to understand how the data model works. |
| 36 | + |
| 37 | +Unfortunately a lot of this data uses non-UUIDs for the identifiers of the dummy data that is being inserted. |
| 38 | + |
| 39 | +As such, you will need to remove this dummy data from your database in order to proceed with the migration. |
| 40 | + |
| 41 | +To do this, find the `undo_dummy_data` folder and go one by one and copy and paste all the undo migration scripts in order |
| 42 | +into the SQL execution service of your choice to remove all of this data from the database. |
| 43 | + |
| 44 | +## Step 2: Validate id columns |
| 45 | +After removing the dummy data, you will next want to check if there still exists non-UUID data in other id columns. |
| 46 | + |
| 47 | +This should only be possible via other migration scripts your application has applied that included invalid UUID data, as |
| 48 | +the BrAPI test server does create UUIDs by default. |
| 49 | + |
| 50 | +To do this validation, you need to create a stored procedure we have written in your DB instance that you can run to verify. |
| 51 | + |
| 52 | +This script is found in the `validate_id_columns.sql` file provided in this directory. |
| 53 | + |
| 54 | +There are two notable omissions of id columns that are not validated in this script: |
| 55 | +* `external_reference.external_reference_id` which is in fact not a UUID column as defined by the production server spec. This is because this ID is supposed to be flexible to whatever id the client sends. |
| 56 | +* `table.auth_user_id` This ID column is already a known issue, and will be resolved in another step. This is because by default the test server inserts `anonymousUser` as the default `auth_user_id` when one isn't sent in the request. More info on the `auth_user_id` section. |
| 57 | + |
| 58 | +Once you have run the validation script, it should tell you any tables and their associated id columns that have invalid UUIDs in their id columns. |
| 59 | + |
| 60 | +If such columns were found, you can run `retrieve_table_data_with_invalid_id_cols.sql` for any of the columns that have invalid data |
| 61 | +to grab the bad data. If the data doesn't fall under any of the steps outlined in this document, you will have to resolve this on your own or you can reach out to someone at the BrAPI team. |
| 62 | + |
| 63 | +It's likely that once offending data is found, the data is referenced as foreign keys in other tables. |
| 64 | + |
| 65 | +If that is the case, and the data (and its associated references) can't just be removed you will have to go through the process of inserting a new row (with a correctly generated UUID) and reassigning the foreign keys pointing to the old id |
| 66 | +to the new one. An example has been done for you via `migrate_crops.sql`. There may be a way to iterate through all the IDs, but hopefully |
| 67 | +the amount of data you have is small enough you can do it one by one. |
| 68 | + |
| 69 | +## Step 3: Dump the database |
| 70 | + |
| 71 | +At this point after you have validated the schema to be rid of non-UUID id data (save `auth_user_id`, more on that soon), |
| 72 | +you are ready to do a pg_dump. |
| 73 | + |
| 74 | +You can accomplish this with the following command: |
| 75 | + |
| 76 | +`pg_dump -U db_username -d db_name --data-only > dump.sql` |
| 77 | + |
| 78 | +where `db_username` is the username you can login with to your database, and `db_name` is the name of the database you want to dump. |
| 79 | + |
| 80 | +This command will grab only the data associated with each table. It will not copy the schema. It will place the results in a `dump.sql` |
| 81 | +file in the directory you ran the command from. |
| 82 | + |
| 83 | +If your database exists in a docker container, the command will look something like this: |
| 84 | + |
| 85 | +`docker exec db_container_name pg_dump -U db_username -d db_name --data-only > dump.sql` |
| 86 | + |
| 87 | +To play it totally safe, let's also grab a copy of the data and the database schema together in the event something goes awry |
| 88 | +in the next steps. |
| 89 | + |
| 90 | +To do this, create another dump using: |
| 91 | + |
| 92 | +`pg_dump -U db_username -d db_name > dump_with_schema.sql` |
| 93 | + |
| 94 | +Or with Docker: |
| 95 | + |
| 96 | +`docker exec db_container_name pg_dump -U db_username -d db_name > dump_with_schema.sql` |
| 97 | + |
| 98 | +In the event that you somehow lose the original database, you can reload it by creating the database and simply loading the |
| 99 | +`dump_with_schema.sql` file onto it. (More on that later). |
| 100 | + |
| 101 | +## Step 4: Modify the dumpfile |
| 102 | + |
| 103 | +**NOTE: We are talking about the `dump.sql` file without the schema here. The schema file we created in step 3 should remain unchanged.** |
| 104 | + |
| 105 | +As stated previously, we've largely been ignoring all the `anonymousUser`-filled `auth_user_id` columns. These do in fact need |
| 106 | +to become UUIDs, and for that, we need to line them up with the new expected default UUID for that column `'AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAAA'`. |
| 107 | + |
| 108 | +Iterating through every table and modifying this column would take too long, so instead just run a find and replace on the exported dump file |
| 109 | +in your text editor of choice. If you have a large amount of data, this operation might be too much for that text editor. In which case you |
| 110 | +likely would have to edit the database for these columns before exporting. |
| 111 | + |
| 112 | +## Step 5: Create and load the new database |
| 113 | + |
| 114 | +Now that the dumpfile has been modified, we are ready to load the new database. |
| 115 | +To do this, use `psql` to login to the postgresql server, wherever it is hosted. |
| 116 | + |
| 117 | +If your database is hosted on a docker container, this looks like: |
| 118 | + |
| 119 | +`docker exec -it name_of_db_container psql -U db_username db_name` |
| 120 | + |
| 121 | +Once in the `psql` CLI, you need to create your new database. Call it something like: |
| 122 | + |
| 123 | +`CREATE DATABASE db_name_uuid` |
| 124 | + |
| 125 | +Eventually, if you want to keep your old database name you can rename this database after the old one is removed. |
| 126 | + |
| 127 | +Next, find the `application.properties` file of the BrAPI server code and change the `spring.datasource.url` so that it points to the correct database, the new one we just created. |
| 128 | + |
| 129 | +Now, build the server application like normal with `mvn clean install` |
| 130 | + |
| 131 | +Then run the application like normal: |
| 132 | + |
| 133 | +`java '-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=localhost:5006' -jar target/brapi-Java-TestServer*.jar` |
| 134 | + |
| 135 | +This should trigger flyway to create the database using the initial schema, which was modified for the production server to |
| 136 | +generate the database with UUID for all `id` columns and their associated foreign keys. |
| 137 | + |
| 138 | +You can verify this was successful by checking the `psql` CLI to see if there were any tables created with |
| 139 | + |
| 140 | +`\dt` |
| 141 | + |
| 142 | +and you can further check that the schema was created with UUID type `id` columns by picking a table and running the table description command, like |
| 143 | + |
| 144 | +`\d program` |
| 145 | + |
| 146 | +which should look something like: |
| 147 | + |
| 148 | +``` |
| 149 | + Table "public.program" |
| 150 | + Column | Type | Collation | Nullable | Default |
| 151 | +---------------------+---------+-----------+----------+--------- |
| 152 | + id | uuid | | not null | |
| 153 | + additional_info | jsonb | | | |
| 154 | + auth_user_id | uuid | | | |
| 155 | + abbreviation | text | | | |
| 156 | + documentationurl | text | | | |
| 157 | + funding_information | text | | | |
| 158 | + name | text | | | |
| 159 | + objective | text | | | |
| 160 | + program_type | integer | | | |
| 161 | + crop_id | uuid | | | |
| 162 | + lead_person_id | uuid | | | |
| 163 | +``` |
| 164 | + |
| 165 | +Now that the database has been created, all that's left it to load the dump file we have into it. |
| 166 | + |
| 167 | +To do this, give `psql` the file as input to load in an import with: |
| 168 | + |
| 169 | +`psql -U db_username db_name_uuid < dump.sql` |
| 170 | + |
| 171 | +For docker, because the file isn't hosted there, you need to pipe it using `cat` like: |
| 172 | + |
| 173 | +`cat dump.sql | docker exec -i name_of_db_container psql -U db_username db_name_uuid ` |
| 174 | + |
| 175 | +This should kick off copy statements for every table. Ensure that there aren't errors. |
| 176 | + |
| 177 | +If an error happens for any of the copy statements, the entire table it was trying to copy will not work. |
| 178 | + |
| 179 | +An error on the `flyway_schema_history` is expected in most cases and is not a worry. You really only want the new flyway schema table created by the migrations |
| 180 | +run the first time you ran the app. |
| 181 | + |
| 182 | +If the errors happened on other tables, you might have to do some sleuthing to figure out why and run this step again. |
| 183 | + |
| 184 | +## Congrats! |
| 185 | + |
| 186 | +With this, you should have successfully migrated the test server DB to the production DB. |
0 commit comments