Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions servicecontrol/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -384,3 +384,38 @@ To mitigate growth or not having enough storage:
7. Scale out audit storage over multiple disks and/or machines:

- [ServiceControl remote instances Sharding audit messages with split audit queues](/servicecontrol/servicecontrol-instances/remotes.md#overview-sharding-audit-messages-with-split-audit-queues)

## Audit instances: Corrupted indexes or corrupted database after a service shutdown

When the following conditions are met:

- ServiceControl Audit instances are installed on Windows as a service
- The audit database size is massive (> 500Gb)
- There is a constant load on the database due to:
- Continuously ingesting messages from the audit queue
- Message expiration kicking in to delete expired audit messages
- Database indexes use the Corax indexing engine

There is a chance that, at service shutdown, ServiceControl takes a long time to shut down and, in most cases, doesn't shut down gracefully because the RavenDB database is busy updating indexes due to ingestion and cleaning up tombstones due to retention.

To mitigate this situation, migrating indexes from the Corax to the Lucene indexing engine can solve the issue. It might be sufficient to migrate to Lucene the `MessagesViewIndex` (regardless of the fact that full-text search is enabled or not), which is the one with the highest load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this backfire, and other issues occur?
We went with Corax when we upgraded to v6, so there must be a reason. It's newer and faster, maybe!

Given that you mentioned Windows, would Linux hosting be an alternative?
Or could we start promoting and supporting external RavenDB? I'm pretty sure everything works, as the containers have proven.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this backfire, and other issues occur?

We have all the tests running against Lucene green. There should be no issues; Lucene has been there since day one and, despite being slower, seems more solid than Corax.

We went with Corax when we upgraded to v6, so there must be a reason. It's newer and faster, maybe!

The main reason was that RavenDB told us that sooner or later, but when it is not a thing yet, Lucene will be removed. The other thing was the performance increase promise, but it turns out that for very large databases it comes with undesirable side effects.

Given that you mentioned Windows, would Linux hosting be an alternative? Or could we start promoting and supporting external RavenDB? I'm pretty sure everything works, as the containers have proven.

That's not so easy, unfortunately. We're also testing external RavenDB without using containers, and a few assumptions in SCMU break when using an external RavenDB server. A similar issue happens on Linux containers too, but it rarely results in a corrupted database due to the different containers' lifecycles.


To migrate indexes from the Corax to the Lucene indexing engine, perform the following steps:

1. Start the ServiceControl Audit instance in [maintenance mode](/servicecontrol/ravendb/accessing-database.md#windows-deployment-maintenance-mode)
2. Access the RavenDB studio
3. Edit the index that needs to be changed
4. From the edit index Configuration tab
5. Change the indexing engine from Corax or Corax (inherited) to Lucene
6. Click save

At this point, there will be two indexes, the original one and the new one with the Lucene indexing engine. The RavenDB studio will offer the option to swap them. The swap operation will:

- Make the Lucene index the default
- Delete the Corax index

After the swap operation, the new Lucene-based index must be rebuilt. Depending on the index size, the operation might take a long time.

When ServiceControl is restarted, the Corax-based index may get recreated. To prevent the ServiceControl instance from recreating the index, the index can be locked.

To lock an index, from the RavenDB studio, while ServiceControl is still in maintenance mode, look for the index that was set to use Lucene and click the `🔓 Unlocked` button. Change the setting to `🔒 Locked` ([Locked Ignore](https://ravendb.net/docs/article-page/7.0/csharp/client-api/operations/maintenance/indexes/set-index-lock#lock-modes)). The RavenDB studio will notify the operation completion with the message: _Lock mode was set to: Locked (ignore)_.