Forum: Bug Report and Feature Requests

Bottom Flat View

broken link parsing

Gauthier 🚫
Updated:

https://codepoints.net/U+1F4F1

https://codepoints.net/U+1F4F1

test

Replies: Gauthier

Gauthier 🚫
Updated:

@Gauthier

typing an url as < a h r e f = "url" >url< / a > breaks the parser.

< a h r e f = "url" >some text< / a > works

Replies: Lazeez Jiddan (Webmaster)

Lazeez Jiddan (Webmaster)

@Gauthier

Fixed.

Replies: Gauthier

Gauthier 🚫

@Lazeez Jiddan (Webmaster)

yes, it works now

Gauthier 🚫

Another bug in the automatic link generator.
There are 2 bugs:
1 It remove preceding space making the link stick to the previous text.
2 It grabs trailing punctuation tested with comma) as part of the link.

Replies: Lazeez Jiddan (Webmaster)

Lazeez Jiddan (Webmaster)

@Gauthier

Made some changes that I hope help. This URL parsing thing is tricky.

Replies: Gauthier Gauthier Gauthier

Gauthier 🚫
Updated:

@Lazeez Jiddan (Webmaster)

1 automatic link test https://storiesonline.net/

2 automatic link test trailing dot https://storiesonline.net/.

3 automatic link test trailing comma https://storiesonline.net/,

Replies: Gauthier

Gauthier 🚫

@Gauthier

test https://storiesonline.net/s/10959:168869 test

Replies: Gauthier

Gauthier 🚫

@Gauthier

test https://storiesonline.net/. test

Replies: Gauthier

Gauthier 🚫
Updated:

@Gauthier

If you put the php code somewhere like pastebin I can take a look.

Alternatively, take inspiration by the masters of the problem: wordpress.

https://developer.wordpress.org/reference/functions/make_clickable/

You'll see the regex from hell to solve almost all the exceptions.

You can either adapt line 2146 to only allow http(s)? as a protocol prefix and rewrite the callback (rather evident) or follow the dependencies.

I would advise to take a look at the dependencies, as they promote security in a lots of place, by normalizing and limiting accepted data.

But you'll delve rapidly in their filter architecture, it's related to plug-in support and totally irrelevant for you.

Here are some dependencies:

_split_str_by_whitespace,

_make_url_clickable_cb

esc_url,

_deep_replace,

clean_url,

wp_kses_bad_protocol,

wp_kses_no_null,

wp_kses_bad_protocol_once,

wp_kses_bad_protocol_once2,

wp_kses_decode_entities,

_wp_kses_decode_entities_chr,

_wp_kses_decode_entities_chr_hexdec,

wp_kses_normalize_entities,

wp_kses_named_entities,

wp_kses_normalize_entities2,

wp_kses_normalize_entities3,

valid_unicode,

wp_allowed_protocols,

kses_allowed_protocols,

...

note that KSES is a recursive acronym which stands for "KSES Strips Evil Scripts".

So those are of particular interest to you.

Replies: Gauthier

Gauthier 🚫
Updated:

@Gauthier

And that among other thing is why I told you that securing a forum is a huge task, and that I barely scratched the surface with my security tests. The number of attack vectors trough encoding, invalid unicode, entities is incredible.

Replies: Lazeez Jiddan (Webmaster)

Lazeez Jiddan (Webmaster)
Updated:

@Gauthier

You know, the simpler solution is to not try to make anything clickable 😈

Anyway, I've made some changes.

Replies: Gauthier Gauthier

Gauthier 🚫
Updated:

@Lazeez Jiddan (Webmaster)

Much better,

Oddly Firefox gives problem on url containing & they are transformed in html entities in the href, that was mandatory for html4/xhtml and officialy shouldn't but should be tolerated with html5.

Apparently with html5 and firefox they are passed as entities to the server which then may fail. Didn't test with other browser.

Gauthier 🚫
Updated:

@Lazeez Jiddan (Webmaster)

Obviously, note however that removing all link won't invalidate all attack vector ;)

Replies: Lazeez Jiddan (Webmaster)

Lazeez Jiddan (Webmaster)

@Gauthier

Sigh!

Yeah, these things suck. Big time!

Gauthier 🚫

@Lazeez Jiddan (Webmaster)

commas inside url automatic detect:
http://domain.com/something/0,123,3.html

Gauthier 🚫

@Lazeez Jiddan (Webmaster)

Still a few issues, but it's already much better.

Gauthier 🚫

At least your server tolerate & to be replaced with & a m p ;

Replies: Lazeez Jiddan (Webmaster)

Lazeez Jiddan (Webmaster)

@Gauthier

yes, but that breaks the URLs.

Fixed, I think.

Replies: Gauthier

Gauthier 🚫
Updated:

@Lazeez Jiddan (Webmaster)

Not really, the [] are striped of the url.

see:

/library/categ.php?key[]=humor&storyType=&contRate[]=5&iip=1&lib=&rf=&ff=&author=&score=&minSize=&maxSize=&p=&sort_field=story_score&sort_order=desc&lc=AND&cmd=Search

becomes:

https://storiesonline.net/library/categ.php?key=humor&storyType=&contRate=5&iip=1&lib=&rf=&ff=&author=&score=&minSize=&maxSize=&p=&sort_field=story_score&sort_order=desc&lc=AND&cmd=Search

which doesn't work.

yes, those regexp sucks real big time.

Forum: Bug Report and Feature Requests

broken link parsing

WARNING! ADULT CONTENT...