Kitz Forum
Chat => Tech Chat => Topic started by: roseway on December 11, 2007, 07:28:11 AM
-
This is just a bit of curiosity on my part. On two different forums today (not this one) I've seen postings by people in which a few of the characters have been displayed as symbols looking like little square dominos with 4 hex characters instead of spots. What are these?
-
I dunno... maybe they are trying to be smart and using some kind of dingbats font.
-
Could you post up a screenshot perhaps, eric? Might help to identify the mystery characters!
-
Here's an example - look in the first Code section near the top. The word 'vga' is enclosed in these characters.
http://www.linuxformat.co.uk/index.php?name=PNphpBB2&file=viewtopic&t=7109
-
And here's a screenshot:
(https://forum.kitz.co.uk/proxy.php?request=http%3A%2F%2Ffarm3.static.flickr.com%2F2039%2F2102445425_5df2da05b8_o.png&hash=a542affbe28cd1e695b0f084a06dfa284b95a79a)
-
Interesting - those are ASCII characters 0x1C and 0x1D - odd!
What I reckon is that the characters were originally displayed in the console as unicode encoding or something, and then somewhere in the copy/paste to the forum, they've been 'converted' back to single-byte characters, and obviously the non-standard ones have been lost!
Actually, on investigating further, I think I know what it is, and my first thought was right :). If you go into character map in Windows, and look at the Unicode punctuation characters (just to narrow it down a bit) - you can see the "shaped" quotation marks are double-byte encoded characters U+201C and U+201D (see attached screenshot) - corresponding to the dodgy characters in the posts, just without the high-order byte!
Problem explained, if not solved! :)
*edit* Your Linux charset displays the hex equivalent of a non-ASCII character... I had to find out what they were the hard way :P
[attachment deleted by admin]
-
A good bit of investigation Chris, despite your rudeness about my character set. It seems that this excellent OS of mine tries to be extra helpful when an unprintable character is displayed. :P
We've got a character selector too, by the way 8)
[attachment deleted by admin]
-
I see the offending characters as pi and not equals.
-
I don't see them at all - don't you just love DBCS :D
-
A good bit of investigation Chris, despite your rudeness about my character set.
Lol - actually I wasn't meaning to be rude when I posted that, it just came out wrong - I was thinking how good it was that it displayed the character code!! But it didn't come across like that, sorry :'(
-
Yes I agree - much nicer to see the real code than have the s/w randomly decide (well not quite but :P) what to display/not display :)
-
Lol - actually I wasn't meaning to be rude when I posted that, it just came out wrong
Oh please, I wasn't offended, I was joking. Perhaps what I said came out wrong. :)
-
I don't see them at all - don't you just love DBCS :D
I had to think about 'DBCS' so I Googled for it, and this was one of the hits on the first page :lol:
[attachment deleted by admin]
-
I see the offending characters as pi and not equals.
Wouldn't standardisation be nice! :)
-
rofl :lol: as if to prove a point, eh? :P
And this text-based communication, no matter how good we are at it, sometimes does mean we don't always successfully get across the manner in which we're trying to say something. I did have a feeling you were joking.
Perhaps I should have replied in Unicode, then everyone would have read my post differently :lol:
-
See how easy it is for messages to get lost in translation... maybe non-standard character sets hold the answers to the riddles of the universe.
-
Maybe it's something to do with parallel universes.
-
The thing is that on my Windows lappie I don't even see quotation marks. I rather suspect that's because I do have the right locale and codepages set and most Windows installs I see don't.
Think DBCS is bad now? It was stupendously horrible in the mid-90s :lol:
-
Rizla, if you're talking about that webpage in question, then no, I don't see quotation marks either... the reason is because the high byte is missing... probably the way the text is rendered / stored in the database / who knows!
Do you see the right characters if you go to charmap as I did?
-
You're quite right there Chris. I would see the quotation marks too if the right Unicode characters were used.
-
Quite possibly Chris although I haven't checked. DBCS (or Unicode-whatever if you prefer that) is a minefield. I'm sure MS have a couple of DBCS which are wrongly identified too but frankly life is too short to bother :D
I'm still very impressed with the real hex code being displayed when the s/w didn't know what to do. Then again if its taken all of us to work it out then maybe it could be clearer? I'm no HCI designer though so what do I know? :)
-
>> I'm still very impressed with the real hex code being displayed when the s/w didn't know what to do.
It's certainly a new one on me, and I'm going to ask around a bit in Linux circles.
-
Someone else who uses PCLinuxOS finds the same as Floydy, i.e. pi and not equals before and after 'vga'. pi appears to be Unicode character 03C0, and I haven't found 'not equals' yet, so goodness knows where those come from.
-
Not equal to is Unicode 2260, so like Eric, dunno where they come from!!
I'm still very impressed with the real hex code being displayed when the s/w didn't know what to do.
Probably simply that the character code is mapped in the character set font for characters that aren't displayed properly? Then again that's probably second nature to me as I remember redefining characters in an 8x8 bitmap on the BBC micro in order to produce simple graphics ;) Maybe it's done by the font renderer as a more general thing... you could check I suppose by seeing if it does it when using a Windows TTF file.
Then again if its taken all of us to work it out...
Oy - what's all this 'us' about - I'm sure I figured it out by myself :P <grin>