Discussion:
Entities in URLS
(too old to reply)
Rainer Jung
2004-07-19 12:39:50 UTC
Permalink
Hello everyone!

I have a discussion with a friend about the &-characters of a URL in
HTML. In my opinion, any &-Character has to be replaced by & in any
html-versions:
| <a href="script.cgi?p1=v1&p2=v2">link</a>
| <a href="script.cgi?p1=v1&amp;p2=v2">link</a>

Is the first line valid in any html-Version!?

Rainer
David Dorward
2004-07-19 15:19:57 UTC
Permalink
Post by Rainer Jung
I have a discussion with a friend about the &-characters of a URL in
HTML. In my opinion, any &-Character has to be replaced by &amp; in any
Either it must, or it need not. It isn't a matter of opinion.

The & must be encoded in HTML. It doesn't matter that it is "a URL", because
it is "a URL that is written in an HTML document"

This could be easily checked using the validator (or by reading the FAQ for
the validator).
--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Nikolaos Giannopoulos
2004-09-04 16:11:34 UTC
Permalink
Post by Rainer Jung
Hello everyone!
I have a discussion with a friend about the &-characters of a URL in
HTML. In my opinion, any &-Character has to be replaced by &amp; in any
| <a href="script.cgi?p1=v1&p2=v2">link</a>
| <a href="script.cgi?p1=v1&amp;p2=v2">link</a>
Is the first line valid in any html-Version!?
No. Actually the '&' chars in a URL are used to specifically separate
parameter name-value pairs passed into the base URL. In your latter
case the variable 'p2' will be seen as 'amp;p2' which is not correct.

Check out the html specs at www.w3c.org for more information.

--Nikolaos
David Dorward
2004-09-04 20:12:14 UTC
Permalink
Post by Nikolaos Giannopoulos
No. Actually the '&' chars in a URL are used to specifically separate
parameter name-value pairs passed into the base URL.
Correct.
Post by Nikolaos Giannopoulos
In your latter
case the variable 'p2' will be seen as 'amp;p2' which is not correct.
Wrong. In HTML &amp; means a literal &. So while it might be a URL, you
still have to write &amp; because it is a URL that is WRITTEN IN HTML.
Post by Nikolaos Giannopoulos
Check out the html specs at www.w3c.org for more information.
http://w3.org/TR/html4/appendix/notes.html#h-B.2.2
--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Jukka K. Korpela
2004-09-04 20:19:15 UTC
Permalink
Post by Nikolaos Giannopoulos
Post by Rainer Jung
Hello everyone!
I have a discussion with a friend about the &-characters of a URL in
HTML. In my opinion, any &-Character has to be replaced by &amp; in
| <a href="script.cgi?p1=v1&p2=v2">link</a> <a
| href="script.cgi?p1=v1&amp;p2=v2">link</a>
Is the first line valid in any html-Version!?
No.
Well, that comment is correct, but the question was already answered the
same day it was asked, over a month ago. I wonder why you comment on the
Post by Nikolaos Giannopoulos
Actually the '&' chars in a URL are used to specifically
separate parameter name-value pairs passed into the base URL.
The ampersand has certain special use in query parts of URLs, but this
has nothing to do with validity. And "the base URL" is an incorrect term
here (check URL specifications for a definition of that term).
Post by Nikolaos Giannopoulos
In
your latter case the variable 'p2' will be seen as 'amp;p2' which is
not correct.
No, that's definitely incorrect. The URL on that line is
script.cgi?p1=v1&p2=v2
but the ampersand has been represented using an entity reference, since
in HTML, &p2 would constitute an entity reference that refers to an
undefined entity, which is an error.
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Loading...