I'm using apache / mod_rewrite to re-write URLs but I am having problems with the plus sign.
With the following rule..
RewriteRule ^/(.+[^/])/?$ http://localhost:8080/app/home?tag=$1 [P,L] 
Both:
http://localhost/1+1 and http://localhost/1%2B2
end up as
uri=http://localhost:8080/app/home, args=tag=1+2
So in both cases the application translates the plus sign into a space so it cannot differentiate between spaces and plus signs anymore.
If I use the "B" flag, then in both cases the + signs are translated into %2B and the application ends up with the same problem but reversed (both spaces and plus signs are plus signs)
Is there a way to get apache to properly escape %2B into a plus sign and not a space?
I read something about mod_security but I am not using that so I am not sure if there is some other security mechanism that is causing this?
Any help would be greatly appreciated!
No, this isn't quite the same as the referenced question. The problem here is specifically plus signs and the answer to Apache: mod_rewrite: Spcaes & Special Characters in URL not working doesn't address that.
There's also an issue with slashes, for which see http://httpd.apache.org/docs/current/mod/core.html#allowencodedslashes (but you do need access to the Apache config to do this - .htaccess won't do).
In fact it is impossible to do using a rewrite rule alone. Apache decodes the URL before putting it through rewrite, but it doesn't understand plus signs: http://example.com/a+b.html wouldn't deliver a file called
"a b.html".
The plus signs are decoded by PHP into the $_GET array (or whatever the relevant language mechanism is) for query strings, because form handlers in browsers put them in. So Apache will translate %2B to + before applying the rewrite, and leave + itself alone, meaning you can't tell the difference.
Of course, one could argue that + used as space is simply invalid in such URLs and one should use only %20. However, if you don't have control over generating them, you're bound to see them. Browsers won't generate them automatically though.
The answer is DIY, and in many ways it is more predictable and simpler:
RewriteRule .* index.php [L]
Hence everything turns into index.php and there's no attempt to construct a query string. If you want to exclude certain patterns, e.g. those with slashes in, or where an explicit file does exist, the obvious amendments apply. e.g. RewriteCond %{REQUEST_FILENAME} !-f
Then in index.php
$uri = substr($_SERVER['REQUEST_URI'], 1); // remove leading slash
$qmpos = strpos($uri, '?'); // is there a question mark, if so where
if ($qmpos !== FALSE) { $uri = substr($uri, 0, $qmpos); } // only the bit before q.m.
$decoded = urldecode($uri); // decode the part before the URL
if (! empty($decoded)) { $_GET['args'] = $decoded; } // add result to $_GET
That decodes the original request (excluding the leading slash - would be slightly different if you're deeper down a hierarchy, but the principle is the same - and excluding any additional query string), and decodes the args parameter according to PHP's normal rules and puts it into $_GET so you can process it along with the rest of the $_GET query string parameters in the usual way.
I believe this should work for empty URLs (http://example.com/) or those which only have a query string (http://example.com/?foo=1), as well as the simple case (http://example.com/bar) and the case with a query string as well (http://example.com/bar?foo=1). No doubt similar approaches will work for other languages.
In your particular case, you actually don't want the pluses decoded in the PHP at all. That's fine, use rawurldecode instead, which doesn't do pluses.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With