https://codepoints.net/U+1F4F1
https://codepoints.net/U+1F4F1
test
typing an url as < a h r e f = "url" >url< / a > breaks the parser.
< a h r e f = "url" >some text< / a > works
Another bug in the automatic link generator.
There are 2 bugs:
1 It remove preceding space making the link stick to the previous text.
2 It grabs trailing punctuation tested with comma) as part of the link.
1 automatic link test https://storiesonline.net/
2 automatic link test trailing dot https://storiesonline.net/.
3 automatic link test trailing comma https://storiesonline.net/,
If you put the php code somewhere like pastebin I can take a look.
Alternatively, take inspiration by the masters of the problem: wordpress.
https://developer.wordpress.org/reference/functions/make_clickable/
You'll see the regex from hell to solve almost all the exceptions.
You can either adapt line 2146 to only allow http(s)? as a protocol prefix and rewrite the callback (rather evident) or follow the dependencies.
I would advise to take a look at the dependencies, as they promote security in a lots of place, by normalizing and limiting accepted data.
But you'll delve rapidly in their filter architecture, it's related to plug-in support and totally irrelevant for you.
Here are some dependencies:
_split_str_by_whitespace,
_make_url_clickable_cb
esc_url,
_deep_replace,
clean_url,
wp_kses_bad_protocol,
wp_kses_no_null,
wp_kses_bad_protocol_once,
wp_kses_bad_protocol_once2,
wp_kses_decode_entities,
_wp_kses_decode_entities_chr,
_wp_kses_decode_entities_chr_hexdec,
wp_kses_normalize_entities,
wp_kses_named_entities,
wp_kses_normalize_entities2,
wp_kses_normalize_entities3,
valid_unicode,
wp_allowed_protocols,
kses_allowed_protocols,
...
note that KSES is a recursive acronym which stands for "KSES Strips Evil Scripts".
So those are of particular interest to you.
And that among other thing is why I told you that securing a forum is a huge task, and that I barely scratched the surface with my security tests. The number of attack vectors trough encoding, invalid unicode, entities is incredible.
You know, the simpler solution is to not try to make anything clickable ๐
Anyway, I've made some changes.
Much better,
Oddly Firefox gives problem on url containing & they are transformed in html entities in the href, that was mandatory for html4/xhtml and officialy shouldn't but should be tolerated with html5.
Apparently with html5 and firefox they are passed as entities to the server which then may fail. Didn't test with other browser.
Obviously, note however that removing all link won't invalidate all attack vector ;)
commas inside url automatic detect:
http://domain.com/something/0,123,3.html
Not really, the [] are striped of the url.
see:
/library/categ.php?key[]=humor&storyType=&contRate[]=5&iip=1&lib=&rf=&ff=&author=&score=&minSize=&maxSize=&p=&sort_field=story_score&sort_order=desc&lc=AND&cmd=Search
becomes:
https://storiesonline.net/library/categ.php?key=humor&storyType=&contRate=5&iip=1&lib=&rf=&ff=&author=&score=&minSize=&maxSize=&p=&sort_field=story_score&sort_order=desc&lc=AND&cmd=Search
which doesn't work.
yes, those regexp sucks real big time.