Skip to content

Commit af89bc8

Browse files
committed
Introduce per geometry and overall limits on number of expire tiles
The number of tiles to be expired can be quite large if the input geometries are large or if there are many geometries. Numbers of tiles in the billions can crash osm2pgsql because it runs out of memory. Such large numbers can also overwhelm any kind of re-rendering mechanism run after osm2pgsql to bring tiles up to date. In day-to-day processing this should not happen, but it can happen due to vandalism or misconfiguration. To protect against this problem, this change introduces limits on the number of tiles that can be affected by a single geometry and the overall number of tiles that an expire output will generate for each run of osm2pgsql. * If a single geometry would result in the expire of more than `max_tiles_geometry` this geometry will be ignored for the purposes of expiry. Note that the geometry will still be written to the database, but no tiles will be added to the expire output. * If the number of tiles generated during a single run of osm2pgsql for an expire output grows beyond `max_tiles_overall`, no further tiles will be written to this output. Limits are per expire output of which you can have several. The limits can be set in the flex expire output configuration but sensible defaults are provided. For the (legacy) expire output configured on the command line with the `-e` and `-o` options, the settings can not be changed, you will always get the default values. To choose the default values for these settings I looked at real-world values as follows: * Russia has one of the largest boundaries in the planet. Expiry (boundary only) on zoom level 14 affects 94144 tiles, on z15 190168 tiles, on z16 383465 tiles. For typical raster tiles using 8x8 meta tiles expiry on z16 is equivalent to showing z19 tiles. So 500,000 tiles seems to be a useful limit for `max_tiles_geometry`. * For expiring the area I looked at the Greenland icesheet, which needs more than 8 million tiles on z14. At least for vector tiles this is good enough, for raster tiles we might need more though. * For `max_tiles_overall`: Paul Norman analyzed the number of tiles expired by typical minutely updates in https://www.openstreetmap.org/user/pnorman/diary/403266. For zoom level 14 the most he got was 119801 tiles. The same analysis also shows that for longer time frames (checked were 2 minutes and 5 minutes, but the same should be true for larger intervals) the number of tiles doesn't go up because these huge numbers only happen very rarely. Rounding these numbers and adding a safety factor, values of 10,000,000 and 50,000,000 seem reasonable for the single geometry and the overall number of tiles per run. Memory use in osm2pgsql is about 32 bytes per tile, so this will need 1.6 GB max which should be no problem at all. The numbers are chosen so they will practically never be triggered so that users upgrading from existing versions of osm2pgsql will not be suddenly affected. It is recommended that users tune their settings according to their own needs. Once we have some more operational experience with this, we can adjust the defaults. I considered using different default max values for different zoom levels, but this will make configuration more complicated. Change file processing in osm2pgsql runs in parallel threads. The old code stored the to-be-expired tiles in one list per thread and merged them later. This has two problems: a) because the lists might contain some of the same tiles, all lists together can use a much larger amount than a single list would take b) we can not easily check the number of tiles in those lists against the configured maximum. So this commit changes the way the list is kept: We only keep a single list in the expire_output_t and use a mutex to control access to this list. (There might still be overlapping lists if you have more than one expire output, but that's by design.) Objects of expire_tiles_t class now only keep a temporary list for each geometry added. Once all tiles affected by a single geometry are identified, this list is added to the overall list in expire_output_t and the temporary list is cleared. Fixes #2190
1 parent f297910 commit af89bc8

23 files changed

+442
-238
lines changed

src/expire-output.cpp

Lines changed: 54 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,66 @@
1717
#include <cerrno>
1818
#include <system_error>
1919

20+
void expire_output_t::add_tiles(
21+
std::unordered_set<quadkey_t> const &dirty_tiles)
22+
{
23+
std::lock_guard<std::mutex> const guard{*m_tiles_mutex};
24+
25+
if (m_overall_tile_limit_reached) {
26+
return;
27+
}
28+
29+
if (dirty_tiles.size() > m_max_tiles_geometry) {
30+
log_warn("Tile limit {} reached for single geometry!",
31+
m_max_tiles_geometry);
32+
return;
33+
}
34+
35+
/**
36+
* This check is not quite correct, because some tiles could be in both,
37+
* the dirty_list and in m_tiles, which means we might not reach
38+
* m_max_tiles_overall if we join those in. But this check is much
39+
* easier and cheaper than trying to add all the tiles into the dirty_list,
40+
* checking each time whether we reached the limit. And with the number
41+
* of tiles involved in doesn't matter that much anyway.
42+
*/
43+
if (dirty_tiles.size() + m_tiles.size() > m_max_tiles_overall) {
44+
m_overall_tile_limit_reached = true;
45+
log_warn("Overall tile limit {} reached for this run!",
46+
m_max_tiles_overall);
47+
return;
48+
}
49+
50+
m_tiles.insert(dirty_tiles.cbegin(), dirty_tiles.cend());
51+
}
52+
53+
bool expire_output_t::empty() noexcept
54+
{
55+
std::lock_guard<std::mutex> const guard{*m_tiles_mutex};
56+
return m_tiles.empty();
57+
}
58+
59+
quadkey_list_t expire_output_t::get_tiles()
60+
{
61+
quadkey_list_t tile_list;
62+
63+
tile_list.reserve(m_tiles.size());
64+
tile_list.assign(m_tiles.cbegin(), m_tiles.cend());
65+
std::sort(tile_list.begin(), tile_list.end());
66+
m_tiles.clear();
67+
68+
return tile_list;
69+
}
70+
2071
std::size_t
21-
expire_output_t::output(quadkey_list_t const &tile_list,
22-
connection_params_t const &connection_params) const
72+
expire_output_t::output(connection_params_t const &connection_params)
2373
{
2474
std::size_t num = 0;
2575
if (!m_filename.empty()) {
26-
num = output_tiles_to_file(tile_list);
76+
num = output_tiles_to_file(get_tiles());
2777
}
2878
if (!m_table.empty()) {
29-
num = output_tiles_to_table(tile_list, connection_params);
79+
num = output_tiles_to_table(get_tiles(), connection_params);
3080
}
3181
return num;
3282
}

src/expire-output.hpp

Lines changed: 68 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,15 @@
1515
#include <cassert>
1616
#include <cstddef>
1717
#include <cstdint>
18+
#include <memory>
19+
#include <mutex>
1820
#include <string>
21+
#include <unordered_set>
1922
#include <utility>
2023

24+
constexpr std::size_t DEFAULT_MAX_TILES_GEOMETRY = 10'000'000;
25+
constexpr std::size_t DEFAULT_MAX_TILES_OVERALL = 50'000'000;
26+
2127
class pg_conn_t;
2228
class connection_params_t;
2329

@@ -53,9 +59,45 @@ class expire_output_t
5359
uint32_t maxzoom() const noexcept { return m_maxzoom; }
5460
void set_maxzoom(uint32_t maxzoom) noexcept { m_maxzoom = maxzoom; }
5561

56-
std::size_t output(quadkey_list_t const &tile_list,
57-
connection_params_t const &connection_params) const;
62+
std::size_t max_tiles_geometry() const noexcept
63+
{
64+
return m_max_tiles_geometry;
65+
}
66+
67+
void set_max_tiles_geometry(std::size_t max_tiles_geometry) noexcept
68+
{
69+
m_max_tiles_geometry = max_tiles_geometry;
70+
}
71+
72+
std::size_t max_tiles_overall() const noexcept
73+
{
74+
return m_max_tiles_overall;
75+
}
76+
77+
void set_max_tiles_overall(std::size_t max_tiles_overall) noexcept
78+
{
79+
m_max_tiles_overall = max_tiles_overall;
80+
}
81+
82+
bool empty() noexcept;
83+
84+
void add_tiles(std::unordered_set<quadkey_t> const &dirty_tiles);
85+
86+
quadkey_list_t get_tiles();
87+
88+
/**
89+
* Write the list of tiles to a database table or file.
90+
*
91+
* \param connection_params Database connection parameters
92+
*/
93+
std::size_t output(connection_params_t const &connection_params);
94+
95+
/**
96+
* Create table for tiles.
97+
*/
98+
void create_output_table(pg_conn_t const &db_connection) const;
5899

100+
private:
59101
/**
60102
* Write the list of tiles to a file.
61103
*
@@ -75,11 +117,16 @@ class expire_output_t
75117
connection_params_t const &connection_params) const;
76118

77119
/**
78-
* Create table for tiles.
120+
* Access to the m_tiles collection of expired tiles must go through
121+
* this mutex, because it can happend from several threads at the same
122+
* time. Mutex is wrapped in a shared_ptr to make this class movable so
123+
* we can store instances in std::vector.
79124
*/
80-
void create_output_table(pg_conn_t const &db_connection) const;
125+
std::shared_ptr<std::mutex> m_tiles_mutex = std::make_shared<std::mutex>();
126+
127+
/// This is where we collect all the expired tiles.
128+
std::unordered_set<quadkey_t> m_tiles;
81129

82-
private:
83130
/// The filename (if any) for output
84131
std::string m_filename;
85132

@@ -95,6 +142,22 @@ class expire_output_t
95142
/// Zoom level we capture tiles on
96143
uint32_t m_maxzoom = 0;
97144

145+
/**
146+
* The following two settings are for protecting osm2pgsql from overload as
147+
* well as downstream tile expiry mechanisms in case of large changes to
148+
* OSM data (possibly from vandalism). They should be large enough to not
149+
* trigger in normal use.
150+
*/
151+
152+
/// Maximum number of tiles that can be affected by a single geometry.
153+
std::size_t m_max_tiles_geometry = DEFAULT_MAX_TILES_GEOMETRY;
154+
155+
/// Maximum number of tiles that can be affected per run.
156+
std::size_t m_max_tiles_overall = DEFAULT_MAX_TILES_OVERALL;
157+
158+
/// Has the overall tile limit been reached already.
159+
bool m_overall_tile_limit_reached = false;
160+
98161
}; // class expire_output_t
99162

100163
#endif // OSM2PGSQL_EXPIRE_OUTPUT_HPP

src/expire-tiles.cpp

Lines changed: 19 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -27,20 +27,30 @@
2727
#include "wkb.hpp"
2828

2929
expire_tiles_t::expire_tiles_t(uint32_t max_zoom,
30-
std::shared_ptr<reprojection_t> projection)
31-
: m_projection(std::move(projection)), m_maxzoom(max_zoom),
32-
m_map_width(static_cast<int>(1U << m_maxzoom))
33-
{}
30+
std::shared_ptr<reprojection_t> projection,
31+
std::size_t max_tiles_geometry)
32+
: m_projection(std::move(projection)), m_max_tiles_geometry(max_tiles_geometry),
33+
m_maxzoom(max_zoom), m_map_width(static_cast<int>(1U << m_maxzoom))
34+
{
35+
}
3436

3537
void expire_tiles_t::expire_tile(uint32_t x, uint32_t y)
3638
{
37-
// Only try to insert to tile into the set if the last inserted tile
38-
// is different from this tile.
39+
if (m_dirty_tiles.size() > m_max_tiles_geometry) {
40+
return;
41+
}
42+
3943
tile_t const new_tile{m_maxzoom, x, y};
40-
if (!m_prev_tile.valid() || m_prev_tile != new_tile) {
41-
m_dirty_tiles.insert(new_tile.quadkey());
42-
m_prev_tile = new_tile;
44+
m_dirty_tiles.insert(new_tile.quadkey());
45+
}
46+
47+
void expire_tiles_t::commit_tiles(expire_output_t *expire_output)
48+
{
49+
if (!expire_output || m_dirty_tiles.empty()) {
50+
return;
4351
}
52+
expire_output->add_tiles(m_dirty_tiles);
53+
m_dirty_tiles.clear();
4454
}
4555

4656
uint32_t expire_tiles_t::normalise_tile_x_coord(int x) const
@@ -281,24 +291,6 @@ quadkey_list_t expire_tiles_t::get_tiles()
281291
return tiles;
282292
}
283293

284-
void expire_tiles_t::merge_and_destroy(expire_tiles_t *other)
285-
{
286-
if (m_map_width != other->m_map_width) {
287-
throw fmt_error("Unable to merge tile expiry sets when "
288-
"map_width does not match: {} != {}.",
289-
m_map_width, other->m_map_width);
290-
}
291-
292-
if (m_dirty_tiles.empty()) {
293-
using std::swap;
294-
swap(m_dirty_tiles, other->m_dirty_tiles);
295-
} else {
296-
m_dirty_tiles.insert(other->m_dirty_tiles.cbegin(),
297-
other->m_dirty_tiles.cend());
298-
other->m_dirty_tiles.clear();
299-
}
300-
}
301-
302294
int expire_from_result(expire_tiles_t *expire, pg_result_t const &result,
303295
expire_config_t const &expire_config)
304296
{

src/expire-tiles.hpp

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include <vector>
1919

2020
#include "expire-config.hpp"
21+
#include "expire-output.hpp"
2122
#include "geom.hpp"
2223
#include "geom-box.hpp"
2324
#include "logging.hpp"
@@ -31,9 +32,8 @@ class expire_tiles_t
3132
{
3233
public:
3334
expire_tiles_t(uint32_t max_zoom,
34-
std::shared_ptr<reprojection_t> projection);
35-
36-
bool empty() const noexcept { return m_dirty_tiles.empty(); }
35+
std::shared_ptr<reprojection_t> projection,
36+
std::size_t max_tiles_geometry = DEFAULT_MAX_TILES_GEOMETRY);
3737

3838
bool enabled() const noexcept { return m_maxzoom != 0; }
3939

@@ -83,10 +83,13 @@ class expire_tiles_t
8383
quadkey_list_t get_tiles();
8484

8585
/**
86-
* Merge the list of expired tiles in the other object into this
87-
* object, destroying the list in the other object.
86+
* Must be called after calling expire_tile() one or more times for a
87+
* single geometry to "commit" all tiles to be expired for that geometry.
88+
*
89+
* \param expire_output The expire output to write tiles to. If this is
90+
* the nullptr, nothing is done.
8891
*/
89-
void merge_and_destroy(expire_tiles_t *other);
92+
void commit_tiles(expire_output_t* expire_output);
9093

9194
private:
9295
/**
@@ -113,11 +116,9 @@ class expire_tiles_t
113116
/// This is where we collect all the expired tiles.
114117
std::unordered_set<quadkey_t> m_dirty_tiles;
115118

116-
/// The tile which has been added last to the unordered set.
117-
tile_t m_prev_tile;
118-
119119
std::shared_ptr<reprojection_t> m_projection;
120120

121+
std::size_t m_max_tiles_geometry;
121122
uint32_t m_maxzoom;
122123
int m_map_width;
123124

src/flex-lua-expire-output.cpp

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,26 @@ create_expire_output(lua_State *lua_state, std::string const &default_schema,
6262
}
6363
lua_pop(lua_state, 1); // "minzoom"
6464

65+
// optional "max_tiles_geometry" field
66+
auto const max_tiles_geometry = luaX_get_table_optional_uint64(
67+
lua_state, "max_tiles_geometry", -1,
68+
"The 'max_tiles_geometry' field in a expire output", 1, (4ULL << 20ULL),
69+
"1 and 4 << 20");
70+
if (max_tiles_geometry > 0) {
71+
new_expire_output.set_max_tiles_geometry(max_tiles_geometry);
72+
}
73+
lua_pop(lua_state, 1); // "max_tiles_geometry"
74+
75+
// optional "max_tiles_overall" field
76+
auto const max_tiles_overall = luaX_get_table_optional_uint64(
77+
lua_state, "max_tiles_overall", -1,
78+
"The 'max_tiles_overall' field in a expire output", 1, (4ULL << 20ULL),
79+
"1 and 4 << 20");
80+
if (max_tiles_overall > 0) {
81+
new_expire_output.set_max_tiles_overall(max_tiles_overall);
82+
}
83+
lua_pop(lua_state, 1); // "max_tiles_overall"
84+
6585
return new_expire_output;
6686
}
6787

@@ -71,6 +91,8 @@ TRAMPOLINE_WRAPPED_OBJECT(expire_output, maxzoom)
7191
TRAMPOLINE_WRAPPED_OBJECT(expire_output, minzoom)
7292
TRAMPOLINE_WRAPPED_OBJECT(expire_output, schema)
7393
TRAMPOLINE_WRAPPED_OBJECT(expire_output, table)
94+
TRAMPOLINE_WRAPPED_OBJECT(expire_output, max_tiles_geometry)
95+
TRAMPOLINE_WRAPPED_OBJECT(expire_output, max_tiles_overall)
7496

7597
} // anonymous namespace
7698

@@ -106,7 +128,11 @@ void lua_wrapper_expire_output_t::init(lua_State *lua_state)
106128
{"maxzoom", lua_trampoline_expire_output_maxzoom},
107129
{"minzoom", lua_trampoline_expire_output_minzoom},
108130
{"schema", lua_trampoline_expire_output_schema},
109-
{"table", lua_trampoline_expire_output_table}});
131+
{"table", lua_trampoline_expire_output_table},
132+
{"max_tiles_geometry",
133+
lua_trampoline_expire_output_max_tiles_geometry},
134+
{"max_tiles_overall",
135+
lua_trampoline_expire_output_max_tiles_overall}});
110136
}
111137

112138
int lua_wrapper_expire_output_t::tostring() const
@@ -150,3 +176,17 @@ int lua_wrapper_expire_output_t::table() const noexcept
150176
luaX_pushstring(lua_state(), self().table());
151177
return 1;
152178
}
179+
180+
int lua_wrapper_expire_output_t::max_tiles_geometry() const noexcept
181+
{
182+
lua_pushinteger(lua_state(),
183+
static_cast<lua_Integer>(self().max_tiles_geometry()));
184+
return 1;
185+
}
186+
187+
int lua_wrapper_expire_output_t::max_tiles_overall() const noexcept
188+
{
189+
lua_pushinteger(lua_state(),
190+
static_cast<lua_Integer>(self().max_tiles_overall()));
191+
return 1;
192+
}

src/flex-lua-expire-output.hpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ class lua_wrapper_expire_output_t : public lua_wrapper_base_t<expire_output_t>
4242
int minzoom() const noexcept;
4343
int schema() const noexcept;
4444
int table() const noexcept;
45+
int max_tiles_geometry() const noexcept;
46+
int max_tiles_overall() const noexcept;
4547

4648
}; // class lua_wrapper_expire_output_t
4749

src/flex-table-column.cpp

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -202,13 +202,18 @@ void flex_table_column_t::add_expire(expire_config_t const &config)
202202
m_expires.push_back(config);
203203
}
204204

205-
void flex_table_column_t::do_expire(geom::geometry_t const &geom,
206-
std::vector<expire_tiles_t> *expire) const
205+
void flex_table_column_t::do_expire(
206+
geom::geometry_t const &geom, std::vector<expire_tiles_t> *expire,
207+
std::vector<expire_output_t> *expire_outputs) const
207208
{
208209
assert(expire);
210+
assert(expire_outputs);
211+
209212
for (auto const &expire_config : m_expires) {
210213
assert(expire_config.expire_output < expire->size());
211-
(*expire)[expire_config.expire_output].from_geometry(geom,
212-
expire_config);
214+
auto &expire_tiles = expire->at(expire_config.expire_output);
215+
expire_tiles.from_geometry(geom, expire_config);
216+
expire_tiles.commit_tiles(
217+
&expire_outputs->at(expire_config.expire_output));
213218
}
214219
}

0 commit comments

Comments
 (0)