Question Displaying unicode

dcs.79c · Mar 23, 2011

What is the source code to display Unicode text in a textbox?

I basically know how to display ASCII code in a text box, but I'd like to be able to display Unicode in a text box.

Thank you.
David

jmcilhinney · Mar 24, 2011

.NET Strings are Unicode by nature, so all text you display in a .NET app is Unicode text. You're going to have to be more specific. Do you mean that you want to read a Unicode-encoded text file and display its contents in a TextBox? Something else? Please ALWAYS provide a FULL and CLEAR description of the ENTIRE problem.

JohnH · Mar 24, 2011

The font used must also support the character set in question, which the default font Microsoft Sans Serif does for lots of sets.

dcs.79c · Mar 25, 2011

jmcilhinney said:
.NET Strings are Unicode by nature, so all text you display in a .NET app is Unicode text. You're going to have to be more specific. Do you mean that you want to read a Unicode-encoded text file and display its contents in a TextBox? Something else? Please ALWAYS provide a FULL and CLEAR description of the ENTIRE problem.

I want to display text in a textbox that is not included in ASCII. For example, musical notation, religious symbols, astrological symbols.

Wibs · Mar 26, 2011

Contrary to a previous reply

.NET Strings are Unicode by nature, so all text you display in a .NET app is Unicode text.

I have now found the answer to your question, and to mine.

I found it very curious, after much Googling, that this question has been asked many times since 2005, and none have a clean answer, and in many of the forum hits and Microsoft Support sites that Google threw up, the question had zero responses.

The simple answer is that Unicode characters CANNOT be displayed in TextBoxes or RichTextboxes. The reason is that although Visual Basic is indeed based internally on the double-byte Unicode standard, when it comes to displaying that data in a TextBox it will be converted from the internal Unicode representation to an ANSI representation before it is written to the TextBox control, and so the UNICODE strings will not display correctly.

The only solution that has been offered in the past is a workaround, and involves using a Forms 2.0 control (which can display Unicode characters correctly).

Unfortunately, Forms 2.0 is part of Microsoft Office and is not redistributable. Therefore, you cannot distribute Forms 2.0 (fm20.dll) with your application. It must already be on the target machines.

If you want to go down the Forms 2.0 route (which is far from ideal, and I would recommend Java, which has no such limitations), then an intro to the whole problem, and the Forms 2.0 approach, is in this MS article.

How To Read and Display UNICODE String on Visual Basic Form

Good luck,

Wibs

dcs.79c · Mar 26, 2011

Wibs said:
Contrary to a previous reply I have now found the answer to your question, and to mine.

I found it very curious, after much Googling, that this question has been asked many times since 2005, and none have a clean answer, and in many of the forum hits and Microsoft Support sites that Google threw up, the question had zero responses.

The simple answer is that Unicode characters CANNOT be displayed in TextBoxes or RichTextboxes. The reason is that although Visual Basic is indeed based internally on the double-byte Unicode standard, when it comes to displaying that data in a TextBox it will be converted from the internal Unicode representation to an ANSI representation before it is written to the TextBox control, and so the UNICODE strings will not display correctly.

The only solution that has been offered in the past is a workaround, and involves using a Forms 2.0 control (which can display Unicode characters correctly).

Unfortunately, Forms 2.0 is part of Microsoft Office and is not redistributable. Therefore, you cannot distribute Forms 2.0 (fm20.dll) with your application. It must already be on the target machines.

If you want to go down the Forms 2.0 route (which is far from ideal, and I would recommend Java, which has no such limitations), then an intro to the whole problem, and the Forms 2.0 approach, is in this MS article.

How To Read and Display UNICODE String on Visual Basic Form

Good luck,

Wibs

Thank you for doing the research.

Well, darn, I was hoping that there is a solution.

Please explain the following from your reply: (...and I would recommend Java, which has no such limitations...). Can I use Java inside a VB program? I think that I can. Can I use Javascript inside a VB program? I think that I can. HMMM. Use VB with embedded Java (or is it Javascript?) to open a browser to display the Unicode characters. Would that work? What is the source code to do that?

I know that I can get software that creates fonts. Let's say that I want to create Klingon fonts. I think that fonts have to be registered on the PC don't they? I assume, though, that the Klingon fonts have to be on the target PC as well, correct? Can a VB program install fonts & register them? I doubt if the person installing the VB program would go for that.

Thank you.
David

Wibs · Mar 26, 2011

I guess it depends on what the ultimate goal of your application is.

I wanted to make a standalone app, a Text Editor that had a single font embedded, in my case this was a Medieval Latin Font, as this was to be a Medieval Latin Text Editor. I wanted users to be able to type normally, but when they needed one of the special Medieval Latin characters they just had to press a toolbar button with the character on the face of the button, to insert that character, rather than searching through the entire Unicode character map. As there were only 26 special characters this was easily achived with 26 buttons on a toolbar.

So, I made my font, using Unicode v. 6.0, but found, as you did, that VB TextBoxes or RichTextBoxes cannot display the Unicode characters above 00FF, which is where all my characters were located.

I then wondered if the same could accomplished via Java, so I took a simple piece of free Java text editor code, placed a button on the toolbar that simply inserted a character into the text area, gave it the Unicode code for the character I wanted, and it inserted it correctly, natively, no workarounds required. I am now working on completing the rest of the Java UI.

I now have the option of distributing this app as a standalone utility, or I could embed it easily into a webpage to make it an online Medieval Latin Editor.

So this approach might work for you, depending on your goals.

Another workaround I came up with, which I tried and works, which gets around the problem with the other workaround of using the non-redistributable Form 2.0, is to remap your font.

Knowing that VB Textboxes only work with character codes up to 00FF (the Ansi/Ascii limitation), and that normal ANSI (keyboard) characters reside between 0021 and 007F, took all the codes above this, ie 0080 -00FF (where all the extral European letters, maths and copyright symbols, etc usually reside), and replaced all the ones I didn't need with the medieval Latin symbols I did need with a Font Editor. I then saved the new Font out as single byte UTF-8.

Selecting this font for my RichTextBox, and creating buttons that insert the required symbol, now within the acceptable ANSI range, solved it.

If you use a custom font though, it either has to be on the machine of those you distribute your app to, or it must be embedded within the app.

Hope these pointers help in some way.

Wibs

jmcilhinney · Mar 26, 2011

Wibs is incorrect. Try this, as I just did.

1. Open the Windows Character Map applet. It should display the Arial font by default.
2. Scroll down to the bottom of the map.
4. Double-click each of the last four characters. They should have values FEF9 - FEFC.
5. Click the Copy button.
6. Create a WinForms project and add a TextBox to the form.
7. Select the Text property of the TextBox and hit Ctrl+V.
8. Run the project.

You should see those four characters copied from the Character Map in your TextBox, as I did.

Wibs · Mar 26, 2011

jmcilhinney said:
Wibs is incorrect. Try this, as I just did.

1. Open the Windows Character Map applet. It should display the Arial font by default.
2. Scroll down to the bottom of the map.
4. Double-click each of the last four characters. They should have values FEF9 - FEFC.
5. Click the Copy button.
6. Create a WinForms project and add a TextBox to the form.
7. Select the Text property of the TextBox and hit Ctrl+V.
8. Run the project.

You should see those four characters copied from the Character Map in your TextBox, as I did.

This is interesting. The copied Unicode text DID paste OK into the RichTextBox, which does appear to contradict the MS Article I quoted from. It even works using Paste programatically:

VB.NET:

Private Sub ToolStripButton3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ToolStripButton3.Click
        RichTextBox1.Paste()
        RichTextBox1.Select(RichTextBox1.Text.Length, 0)
    End Sub

However, attempting to insert characters, not via the Clipboard, but programmatically, is still defying a solution.

If I have a single RichTextBox and two buttons, and the code behind them is this:

VB.NET:

Private Sub ToolStripButton1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ToolStripButton1.Click
        RichTextBox1.Text = RichTextBox1.Text.Insert(RichTextBox1.SelectionStart, "a")
        RichTextBox1.Select(RichTextBox1.Text.Length, 0)
    End Sub
    Private Sub ToolStripButton2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ToolStripButton2.Click
        RichTextBox1.Text = RichTextBox1.Text.Insert(RichTextBox1.SelectionStart, "b")
        RichTextBox1.Select(RichTextBox1.Text.Length, 0)
    End Sub

Then clicking those buttons inserts the letters 'a' and 'b' respectively. However, if I paste any of the characters in your example between the quotes (and they do show as those Arabic characters), and assign the Ariel Font to my RichTextBox, and then try clicking the buttons I just get the square boxes. Similarly, I can use the following:

VB.NET:

RichTextBox1.Text = RichTextBox1.Text.Insert(RichTextBox1.SelectionStart, ChrW(&H0022))

ChrW(&Hxxxx) values up to &H00FF all display OK, but beyond 00FF all that displays is squares.

Wibs

JohnH · Mar 26, 2011

Wibs said:
This is interesting. The copied Unicode text DID paste OK into the RichTextBox, which does appear to contradict the MS Article I quoted from.

The article is clearly for VB6 or older, and not relevant for current .Net platform. VB and .Net has no problems with unicode or any other text encodings.
Enumerating for example all chars in Arial font displays correctly here.

Wibs · Mar 29, 2011

OK, after lots of experimenting, and (using our work MS support account), we have confirmed that there is indeed a problem, but the source is still eluding us.

The experiment set-up was as follows:

VB2005
Single RichTextBox, single Button
The Button code was simply:
RichTextBox1.Text = RichTextBox1.Text.Insert(RichTextBox1.SelectionStart, "a")
or
RichTextBox1.Text = RichTextBox1.Text.Insert(RichTextBox1.SelectionStart, ChrW(&HA752))
RichTextBox font was set to Andron Scriptor Web

This is what we found:

MSGuru: Have you tried copy/pasting a desired Unicode character from Character Map into the RTB?
Me: Yes, I was asked to try that in one of the forums and it worked OK

MSGuru: Did you try copy/pasting that character into the quotes, replacing "a"?
Me: Yes, but it didn't work, but will try again - Ok, when pasted it shows a square in the form code, but when I go to Debugging, and press the button it displays the character!

MSGuru: OK, get the Unicode code for that character, and use it with ChrW, what happens then?
Me: That displays too!

MSGuru: Ok, so what is the problem?
Me: let me check....

Me: OK, if I open Character Map, load the Andron font, and go high up the map, to say, E100, and use the character there or the code, it displays the character OK. However, all the codes for the Latin symbols in the Andron font are in the range A750 - A770, but Character Map does NOT show them! Character Map jumps from 02xx to E0xx (right over the characters I want). So it appears that if Character Map does not see the character, neither will the RichTextBox? So why doesn't Character Map see all the Unicode characters in this font?

MSGuru: and in Character Map you do have Unicode selected?
Me: yes

MSGuru: and how do you know that the Andron font has the characters you want?
Me: because they show it on their website, and if I load the Andron font into a Unicode Font Viewer, such as Babelmap, and scroll down to A750 they are all there.

MSGuru: Puzzler, will have to get back to you.

Anyone have any thoughts on this before MSGuru gets back?

Wibs

JohnH · Mar 29, 2011

What I can say is that there exist a version 2.0 and 3.0 of that font, and only the 3.0 version has the char range A750 - A770 that you mention, which displays fine in Character Map also. Still there are problems with that font, using the v3 font those chars will display in a Label and a TextBox, but not in a RichTextBox. The only problem I could see in Character Map was that those chars did not have a unicode subgroup defined ('undefined'). Why the RichTextBox (which is a OS control) does not display them correctly beats me, I've never seen anything like it and would blame the font. The same erratic behaviour can be seen using Notepad and Wordpad, which uses the same controls. For example pasting one such char into Wordpad set to that font will simply change the font as if the font didn't support that code point (when I tested a font that supported Chinese GB2312 was substituted).

Code editor uses a single font to display the code, and can only display chars from that font, even though the character code is valid (when any char is pasted), and will therefore display the correct char when the text is displayed with the designated font.

Wibs · Mar 29, 2011

JohnH said:
What I can say is that there exist a version 2.0 and 3.0 of that font, and only the 3.0 version has the char range A750 - A770 that you mention, which displays fine in Character Map also. Still there are problems with that font, using the v3 font those chars will display in a Label and a TextBox, but not in a RichTextBox. The only problem I could see in Character Map was that those chars did not have a unicode subgroup defined ('undefined'). Why the RichTextBox (which is a OS control) does not display them correctly beats me, I've never seen anything like it and would blame the font. The same erratic behaviour can be seen using Notepad and Wordpad, which uses the same controls. For example pasting one such char into Wordpad set to that font will simply change the font as if the font didn't support that code point (when I tested a font that supported Chinese GB2312 was substituted).

Code editor uses a single font to display the code, and can only display chars from that font, even though the character code is valid (when any char is pasted), and will therefore display the correct char when the text is displayed with the designated font.

I have the v 3.0 of that font John, and the characters display fine in BabelMap but do not in Character Map. This is the case both on my home PC and on the work PC. I am wondering if this could be a Regional and Language, Supplemental Language Support issue?? In BabelMap it shows the group as Latin Extended-D, but I don't know if that is because BabelMap simply says so, or that info has come from the font itself.

A respondent on the babelmap forum said this:

The simple answer is that character map is (to put it politely) not very good, and has not been updated to reflect new versions of Unicode since it was created. The version that ships with XP is stuck in the world of Unicode 3.0 from 1999, and will only display characters that were defined in Unicode 3.0 (i.e. only 49,259 out of the current total of 109,449 characters). The versions of character map that ship with Vista and 7 are not much better.

Wibs

JohnH · Mar 29, 2011

Since I use Vista and Character Map displays it, I take it you are using XP. Anyway, the Vista RichTextBox control will also not display those chars for that font, so I can only advice you to use a different font or use the Label or TextBox control to display them.

Wibs · Mar 29, 2011

JohnH said:
Since I use Vista and Character Map displays it, I take it you are using XP. Anyway, the Vista RichTextBox control will also not display those chars for that font, so I can only advice you to use a different font or use the Label or TextBox control to display them.

Thanks.

I will try a couple of other fonts that cover those symbols, and I will try a TextBox instead of a RichTextBox, this evening, and report back the results.

Wibs

Question Displaying unicode

dcs.79c

Member

jmcilhinney

VB.NET Forum Moderator

JohnH

VB.NET Forum Moderator

dcs.79c

Member

Wibs

Member

dcs.79c

Member

Wibs

Member

jmcilhinney

VB.NET Forum Moderator

Wibs

Member

JohnH

VB.NET Forum Moderator

Wibs

Member

JohnH

VB.NET Forum Moderator

Wibs

Member

JohnH

VB.NET Forum Moderator

Wibs

Member

Similar threads

Share this page

Latest posts