|
1 | | -# Url-knife [](https://travis-ci.org/patternknife/url-knife) [](https://www.npmjs.com/package/url-knife) [](https://www.jsdelivr.com/package/gh/patternknife/url-knife) [](https://bundlephobia.com/result?p=url-knife) |
| 1 | +# Url-knife [](https://www.npmjs.com/package/url-knife) [](https://www.jsdelivr.com/package/gh/patternknife/url-knife) [](https://bundlephobia.com/result?p=url-knife) |
2 | 2 | ## Overview |
3 | 3 | Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with robust patterns. |
4 | 4 |
|
@@ -35,9 +35,7 @@ import Pattern from 'url-knife'; |
35 | 35 |
|
36 | 36 | [Chapter 3. Extract URIs with certain names](#chapter-3-extract-uris-with-certain-names) |
37 | 37 |
|
38 | | -[Chapter 4. Extract all fuzzy URLs](#chapter-4-extract-all-fuzzy-urls) (False positives detected) |
39 | | - |
40 | | -[Chapter 5. Extract all URLs in raw HTML or XML](#chapter-5-extract-all-urls-in-raw-html-or-xml) |
| 38 | +[Chapter 4. Extract all URLs in raw HTML or XML](#chapter-4-extract-all-urls-in-raw-html-or-xml) |
41 | 39 |
|
42 | 40 |
|
43 | 41 | #### Chapter 1. Normalize or parse one URL |
@@ -450,31 +448,8 @@ var sampleText = 'https://google.com/abc/777?a=5&b=7 abc/def 333/kak abc/55에 |
450 | 448 | } |
451 | 449 | ] |
452 | 450 | ``` |
453 | | - |
454 | | -#### Chapter 4. Extract all fuzzy URLs |
455 | | -##### The strongest url extracting method of URL-knife in natural language texts. However, this does not detect intranets due to false positives. If you need to extract intranets, go back to the Chapter 2 above. |
456 | | - |
457 | | -``` javascript |
458 | | -var textStr = '142 .42.1.1:8080 123.45 xtp://--[::1]:8000에서 h ttpp ;//-www.ex ample;com -/wpstyle/??p=3?6/4&x=5/3 in the ssh h::/;/ww.example.com/wpstyle/?p=364 is ok ' + |
459 | | - 'h ttp:/://132 .42.,1.1 HT TP:// foo, co,.kr/blah_blah_(wikipedia) https://www.google .org :8005/maps/place/USA/@36.2218457,... tnae1ver.co. jp;8000on the internet Asterisk\n ' + |
460 | | - 'the packed1book.net. 가나다@apacbook.ac.kr fakeshouldnotbedetected.url?abc=fake s5houl7十七日dbedetected.jp?japan=go&html=<span>가나다@pacbook.travelersinsurance</span>;' + |
461 | | - ' abc,com//ad/fg/?kk=5 abc@daum.net Have you visited http://agoasidaio.ac.kr?abd=55...,.&kkk=5rk.,, ' + |
462 | | - 'Have <b>you</b> visited goasidaio.ac.kr?abd=5hell0?5...&kkk=5rk.,. '; |
463 | | - |
464 | | - /** |
465 | | - * @brief |
466 | | - * Distill all urls including fuzzy matched ones from normal text |
467 | | - * @author Andrew Kang |
468 | | - * @param textStr string required |
469 | | - |
470 | | - |
471 | | - var urls = Pattern.TextArea.extractAllFuzzyUrls(textStr) |
472 | | - ``` |
473 | | - ###### console.log() |
474 | | -<a href="https://jsfiddle.net/AndrewKang/p0tc4ovb/" target="_blank">LIVE DEMO</a> |
475 | | -
|
476 | 451 |
|
477 | | -#### Chapter 5. Extract all URLs in raw HTML or XML |
| 452 | +#### Chapter 4. Extract all URLs in raw HTML or XML |
478 | 453 |
|
479 | 454 | ``` javascript |
480 | 455 | // The sample of 'XML (HTML)' |
@@ -538,4 +513,4 @@ var urls = PatternExtractor.XmlArea.extractAllUrls(xmlStr); |
538 | 513 | ] |
539 | 514 | ``` |
540 | 515 |
|
541 | | -Please inform me of more sophisticated patterns you need by leaving issues on Github or emailing me at studypurpose@naver.com. |
| 516 | +Please inform me of more sophisticated patterns you need by leaving issues or emailing me at studypurpose@naver.com. |
0 commit comments