URL query parameters with Scandinavian characters parsed weirdly in servlet

PetriJuhaniRiipinen · January 27, 2022, 3:00pm

I have strange issue with parsing a Scandinavian character from an URL query parameter.

So here is the query parameter that I’m receiving in the request:

name=%C3%96vertid

That is printed straight from HttpServletRequest.getQueryString() in an action controller. That should decode into ‘Övertid’, so the Ö-character on the front.

When Confluence sets the name-parameter value into my controller, I get that value exactly like this:

“Ã vertid” → Hex c3 83 c2 96 76 65 72 74 69 64

“vertid” would be 76 65 72 74 69 64 so there is c3 83 c2 96 before it.

Then I decode it into string like this:

this.name = new String(name.getBytes(StandardCharsets.ISO_8859_1));

And this produces exactly ‘Övertid’ in UTF-8 → Hex c396 766572746964 so the c396 is the Ö there which looks fine.

So basically the conversion works on my local machine, although I doubt it should be like that. I’m highly doubting doing the ISO_8859_1-decoding there, but that’s the only way I can get the string into correct value on my dev env.

However, when my QA guy tries this on his QA Confluence-server, the URL is exactly the same, but when the setName-setter is called, he gets this: ‘?vertid’ 3f 766572746964 so for him there is just 3f at the front of the value.

What on earth is going on here? Browser is the same, URL is the same but we are getting totally different values injected into the setName-setter in our instances.

request.getCharacterEncoding() has the value ‘UTF-8’ on the request.

Any ideas how the value gets screwed? Any encoding settings that I could try in … hmm… Tomcat? Or some servlet settings? Where is the code that actually parses the query string parameters before injecting them into the servlet-setters?

PetriJuhaniRiipinen · February 1, 2022, 10:49am

Found root cause and solution for this and it is in that configuration element:

<Connector port="8090" connectionTimeout="20000" redirectPort="8443"
                   maxThreads="48" minSpareThreads="10"
                   enableLookups="false" acceptCount="10" debug="0" URIEncoding="ISO-8859-1"
                   protocol="org.apache.coyote.http11.Http11NioProtocol"
                   URIEncoding="ISO-8859-1"/>

URIEncoding is different from what is used in the URL so of course the UTF-8 encoded Cyrillic alphabet won’t be decoded properly for the servlet.

So changing that to:

URIEncoding="UTF-8"

Makes it work fine.

Looks like the URIEncoding varies between ISO-8859-1 and UTF-8 depending on Confluence Server version. And on some server version its totally missing.

I wonder why… A little bit of consistency here would be much appreciated Atlassian, you certainly have customers from non-English countries also.

I wonder what breaks when I recommend our customers to set that to UTF-8, hopefully nothing.