Home « Forum « Bug Report and Feature Requests

Forum: Bug Report and Feature Requests

broken link parsing

Gauthier
Updated:

https://codepoints.net/U+1F4F1

https://codepoints.net/U+1F4F1

test

Replies:   Gauthier  Gauthier
Gauthier
Updated:

@Gauthier

typing an url as < a h r e f = "url" >url< / a > breaks the parser.

< a h r e f = "url" >some text< / a > works

Lazeez Jiddan (Webmaster)

@Gauthier

Fixed.

Replies:   Gauthier
Gauthier

@Lazeez Jiddan (Webmaster)

yes, it works now

Gauthier
Updated:

@Gauthier

A new link parsing bug this one affect post editing:

when doing a link to SOL, on the edited post the link is root relative (without the prefixhttp://storiesonline.net). Consequently on save, the link is striped of the post, by lack of">http://

I guess the reason is to make the link work on the secure alternate domain.

Then you have to alter the regexp to accept links starting with http:// ; https:// ; /

Replies:   Gauthier
Gauthier

@Gauthier

Another bug in the automatic link generator.
There are 2 bugs:
1 It remove preceding space making the link stick to the previous text.
2 It grabs trailing punctuation tested with comma) as part of the link.

Lazeez Jiddan (Webmaster)

@Gauthier

Made some changes that I hope help. This URL parsing thing is tricky.

Replies:   Gauthier  Gauthier  Gauthier
Gauthier
Updated:

@Lazeez Jiddan (Webmaster)

1 automatic link test http://storiesonline.net/

2 automatic link test trailing dot http://storiesonline.net/.

3 automatic link test trailing comma http://storiesonline.net/,

Replies:   Gauthier
Gauthier

@Lazeez Jiddan (Webmaster)

commas inside url automatic detect:
http://domain.com/something/0,123,3.html

Gauthier

@Lazeez Jiddan (Webmaster)

Still a few issues, but it's already much better.

Gauthier

@Gauthier

test http://storiesonline.net/s/10959:168869 test

Replies:   Gauthier
Gauthier

@Gauthier

test http://storiesonline.net/. test

Replies:   Gauthier
Gauthier
Updated:

@Gauthier

If you put the php code somewhere like pastebin I can take a look.

Alternatively, take inspiration by the masters of the problem: wordpress.

https://developer.wordpress.org/reference/functions/make_clickable/

You'll see the regex from hell to solve almost all the exceptions.

You can either adapt line 2146 to only allow http(s)? as a protocol prefix and rewrite the callback (rather evident) or follow the dependencies.

I would advise to take a look at the dependencies, as they promote security in a lots of place, by normalizing and limiting accepted data.

But you'll delve rapidly in their filter architecture, it's related to plug-in support and totally irrelevant for you.

Here are some dependencies:

_split_str_by_whitespace,

_make_url_clickable_cb

esc_url,

_deep_replace,

clean_url,

wp_kses_bad_protocol,

wp_kses_no_null,

wp_kses_bad_protocol_once,

wp_kses_bad_protocol_once2,

wp_kses_decode_entities,

_wp_kses_decode_entities_chr,

_wp_kses_decode_entities_chr_hexdec,

wp_kses_normalize_entities,

wp_kses_named_entities,

wp_kses_normalize_entities2,

wp_kses_normalize_entities3,

valid_unicode,

wp_allowed_protocols,

kses_allowed_protocols,

...

note that KSES is a recursive acronym which stands for "KSES Strips Evil Scripts".

So those are of particular interest to you.

Replies:   Gauthier
Gauthier
Updated:

@Gauthier

And that among other thing is why I told you that securing a forum is a huge task, and that I barely scratched the surface with my security tests. The number of attack vectors trough encoding, invalid unicode, entities is incredible.

Lazeez Jiddan (Webmaster)
Updated:

@Gauthier

You know, the simpler solution is to not try to make anything clickable 😈

Anyway, I've made some changes.

Replies:   Gauthier  Gauthier
Gauthier
Updated:

@Lazeez Jiddan (Webmaster)

Much better,

Oddly Firefox gives problem on url containing & they are transformed in html entities in the href, that was mandatory for html4/xhtml and officialy shouldn't but should be tolerated with html5.

Apparently with html5 and firefox they are passed as entities to the server which then may fail. Didn't test with other browser.

Gauthier
Updated:

@Lazeez Jiddan (Webmaster)

Obviously, note however that removing all link won't invalidate all attack vector ;)

Lazeez Jiddan (Webmaster)

@Gauthier

Sigh!

Yeah, these things suck. Big time!

Gauthier

At least your server tolerate & to be replaced with & a m p ;

Lazeez Jiddan (Webmaster)

@Gauthier

yes, but that breaks the URLs.

Fixed, I think.

Replies:   Gauthier
Gauthier
Updated:

@Lazeez Jiddan (Webmaster)

Not really, the [] are striped of the url.

see:

/library/categ.php?key[]=humor&storyType=&contRate[]=5&iip=1&lib=&rf=&ff=&author=&score=&minSize=&maxSize=&p=&sort_field=story_score&sort_order=desc&lc=AND&cmd=Search

becomes:

http://storiesonline.net/library/categ.php?key=humor&storyType=&contRate=5&iip=1&lib=&rf=&ff=&author=&score=&minSize=&maxSize=&p=&sort_field=story_score&sort_order=desc&lc=AND&cmd=Search

which doesn't work.

yes, those regexp sucks real big time.

Lazeez Jiddan (Webmaster)

@Gauthier

Honestly, I don't care to support the URLs with square brackets.

Back to Top