-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Adding --direct-io flag for model loading #18166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I think you need this: diff --git a/src/llama-model-loader.cpp b/src/llama-model-loader.cpp
index 1355eea95..2db2115a0 100644
--- a/src/llama-model-loader.cpp
+++ b/src/llama-model-loader.cpp
@@ -918,8 +918,7 @@ void llama_model_loader::load_data_for(struct ggml_tensor * cur) const {
GGML_ASSERT(cur->data != nullptr);
GGML_ASSERT(w.idx < files.size());
const auto & file = files.at(w.idx);
- file->seek(w.offs, SEEK_SET);
- file->read_raw(cur->data, ggml_nbytes(cur));
+ file->read_raw_at(cur->data, ggml_nbytes(cur), w.offs);
}
if (check_tensors && !ggml_validate_row_data(cur->type, cur->data, ggml_nbytes(cur))) {Probably need to assert that |
|
Thanks for the hint, changed that. I think an assert would not work as |
Ok. Would we even need to have Also some suggestions that I have not tested, but should at least convey what I mean: diff --git a/src/llama-model-loader.cpp b/src/llama-model-loader.cpp
index 2db2115a0..ae0c698be 100644
--- a/src/llama-model-loader.cpp
+++ b/src/llama-model-loader.cpp
@@ -508,8 +508,11 @@ llama_model_loader::llama_model_loader(
files.emplace_back(new llama_file(fname.c_str(), "rb", use_direct_io));
contexts.emplace_back(ctx);
- // Disable mmap in case Direct I/O is enabled and available
- if (use_direct_io && files.at(0)->has_direct_io()) {
+ // check if direct io is enabled and supported
+ use_direct_io = use_direct_io && files.back()->has_direct_io();
+
+ if (use_direct_io && use_mmap) {
+ LLAMA_LOG_WARN("%s: direct I/O is enabled, disabling mmap\n", __func__);
use_mmap = false;
}
@@ -581,6 +584,10 @@ llama_model_loader::llama_model_loader(
files.emplace_back(new llama_file(fname_split, "rb", use_direct_io));
contexts.emplace_back(ctx);
+ if (use_direct_io && !files.back()->has_direct_io()) {
+ throw std::runtime_error(format("unexpected: direct I/O is not supported for split file %s", fname_split));
+ }
+
// Save tensors data offset info of the shard.
for (ggml_tensor * cur = ggml_get_first_tensor(ctx); cur; cur = ggml_get_next_tensor(ctx, cur)) {
std::string tensor_name = std::string(cur->name);
@@ -722,6 +729,7 @@ llama_model_loader::llama_model_loader(
}
this->use_mmap = use_mmap;
+ this->use_direct_io = use_direct_io;
this->check_tensors = check_tensors;
this->no_alloc = no_alloc;
}
diff --git a/src/llama-model-loader.h b/src/llama-model-loader.h
index de06b5283..6f15115ce 100644
--- a/src/llama-model-loader.h
+++ b/src/llama-model-loader.h
@@ -70,6 +70,7 @@ struct llama_model_loader {
size_t n_bytes = 0;
bool use_mmap = false;
+ bool use_direct_io = false;
bool check_tensors;
bool no_alloc;
diff --git a/src/llama-model.cpp b/src/llama-model.cpp
index cf0c39475..502859d2e 100644
--- a/src/llama-model.cpp
+++ b/src/llama-model.cpp
@@ -2337,7 +2337,8 @@ bool llama_model::load_tensors(llama_model_loader & ml) {
const bool use_mmap_buffer = true;
- LLAMA_LOG_INFO("%s: loading model tensors, this can take a while... (mmap = %s)\n", __func__, ml.use_mmap ? "true" : "false");
+ LLAMA_LOG_INFO("%s: loading model tensors, this can take a while... (mmap = %s, direct_io = %s)\n",
+ __func__, ml.use_mmap ? "true" : "false", ml.use_direct_io ? "true" : "false");
// build a list of buffer types for the CPU and GPU devices
pimpl->cpu_buft_list = make_cpu_buft_list(devices, params.use_extra_bufts, params.no_host); |
|
Just an FYI. #18012 Broke loading with mmap disabled on windows. |
|
@askmyteapot Thank you for the hint. Issue was that I used |
|
@ggerganov Should I isolate the changes with the Windows fix in a new PR? |
|
Yes, would like to take extra look at the changes here, so better to fix the windows issue in the meantime. Thanks |
Follow up for PR #18012 (comment).
To enable Direct IO model reading by default on Linux and Windows, but to stay with
--mmapas default on Mac, this PR adds an additional flag for enabling/disabling Direct IO. This flag is by default true and overrules the mmap parameter. In case--direct-iois true and Direct IO is available,--mmapgets disabled. And if--no-direct-iois set or Direct IO is not available (e.g. on Mac), the specified mmap value is used.