Commit 850a6a6
authored
Don't remove excluded-on-redirect URLs from seen list (#936)
Fixes #937
- Don't remove URLs from seen list
- Add new excluded key, add URLs to be excluded (out-of-scope on
redirect) to excluded set. The size of this set can be used to get the
URLs that have been excluded in this way, to compute number of
discovered URLs.
- Don't write urn:pageinfo records for excluded pages, along with not
writing to pages/extraPages.jsonl1 parent 4a703cd commit 850a6a6
File tree
4 files changed
+79
-8
lines changed- src
- util
- tests
4 files changed
+79
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2076 | 2076 | | |
2077 | 2077 | | |
2078 | 2078 | | |
2079 | | - | |
2080 | 2079 | | |
2081 | 2080 | | |
2082 | 2081 | | |
2083 | 2082 | | |
2084 | | - | |
| 2083 | + | |
2085 | 2084 | | |
2086 | 2085 | | |
2087 | 2086 | | |
| |||
2219 | 2218 | | |
2220 | 2219 | | |
2221 | 2220 | | |
| 2221 | + | |
2222 | 2222 | | |
2223 | 2223 | | |
2224 | 2224 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
| 121 | + | |
121 | 122 | | |
122 | 123 | | |
123 | 124 | | |
| |||
743 | 744 | | |
744 | 745 | | |
745 | 746 | | |
| 747 | + | |
746 | 748 | | |
747 | 749 | | |
748 | 750 | | |
| |||
946 | 948 | | |
947 | 949 | | |
948 | 950 | | |
| 951 | + | |
949 | 952 | | |
950 | 953 | | |
951 | 954 | | |
| |||
974 | 977 | | |
975 | 978 | | |
976 | 979 | | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
977 | 988 | | |
978 | 989 | | |
979 | 990 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
133 | 139 | | |
134 | 140 | | |
135 | 141 | | |
136 | 142 | | |
137 | 143 | | |
| 144 | + | |
138 | 145 | | |
139 | 146 | | |
140 | 147 | | |
| |||
203 | 210 | | |
204 | 211 | | |
205 | 212 | | |
| 213 | + | |
206 | 214 | | |
207 | 215 | | |
208 | 216 | | |
| |||
228 | 236 | | |
229 | 237 | | |
230 | 238 | | |
| 239 | + | |
| 240 | + | |
231 | 241 | | |
232 | 242 | | |
233 | 243 | | |
| |||
267 | 277 | | |
268 | 278 | | |
269 | 279 | | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
270 | 284 | | |
271 | 285 | | |
272 | 286 | | |
273 | 287 | | |
274 | 288 | | |
275 | 289 | | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
276 | 297 | | |
277 | | - | |
| 298 | + | |
278 | 299 | | |
279 | | - | |
| 300 | + | |
280 | 301 | | |
281 | 302 | | |
282 | 303 | | |
| |||
303 | 324 | | |
304 | 325 | | |
305 | 326 | | |
306 | | - | |
| 327 | + | |
307 | 328 | | |
308 | 329 | | |
309 | 330 | | |
| |||
464 | 485 | | |
465 | 486 | | |
466 | 487 | | |
467 | | - | |
| 488 | + | |
468 | 489 | | |
469 | 490 | | |
470 | 491 | | |
| |||
486 | 507 | | |
487 | 508 | | |
488 | 509 | | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
489 | 514 | | |
490 | 515 | | |
491 | 516 | | |
| |||
501 | 526 | | |
502 | 527 | | |
503 | 528 | | |
504 | | - | |
| 529 | + | |
505 | 530 | | |
506 | 531 | | |
507 | 532 | | |
| |||
721 | 746 | | |
722 | 747 | | |
723 | 748 | | |
| 749 | + | |
724 | 750 | | |
725 | 751 | | |
726 | 752 | | |
| |||
763 | 789 | | |
764 | 790 | | |
765 | 791 | | |
| 792 | + | |
766 | 793 | | |
767 | 794 | | |
| 795 | + | |
768 | 796 | | |
769 | 797 | | |
770 | 798 | | |
| |||
774 | 802 | | |
775 | 803 | | |
776 | 804 | | |
| 805 | + | |
777 | 806 | | |
778 | 807 | | |
779 | 808 | | |
| |||
860 | 889 | | |
861 | 890 | | |
862 | 891 | | |
| 892 | + | |
863 | 893 | | |
864 | 894 | | |
865 | 895 | | |
| |||
955 | 985 | | |
956 | 986 | | |
957 | 987 | | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
958 | 993 | | |
959 | 994 | | |
960 | 995 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
0 commit comments