CS 100 (Learn)CS 100 (Web)Module 07


HTML :: Special Characters

Previously we saw the special code  . It is known as a character entity reference. A character entity reference starts with an ampersand (&) and ends with a semi-colon (;).

Because the ampersand (&) is used for character entities, to display an ampersand you should use the special entity &

Similarly, the angle brackets (<, >) used for tags can be displayed with &lt; and &gt; for less-than and greater-than.

Here is another example:

HTML

Jamie Sal&eacute; commence &agrave; patiner &agrave; l'&acirc;ge de 5 ans.

web browser

Jamie Salé commence à patiner à l'âge de 5 ans.

 

A collection of entities is available on this Online Chart.

Unicode

Character entities are convenient for a few characters, but in a previous Module, we saw how the Unicode standard can be used to represent characters from languages all over the world.

To use a Unicode character in HTML, you add a number sign (#) in front of the number within the entity wrapper.

For example, to display the happy face Unicode character 12852210 in HTML, you write &#128522; (😊).

HTML

Hello &#128522;

web browser

Hello 😊

 

In practice, Unicode numbers are more often known by their HEX codes. To use a Unicode character hex code as an entity, add an extra x after the number sign (#x) to indicate that the code is in hex.

HTML

My six-year-old daughter's favourite emoji is &#x1F4A9;.

web browser

My six-year-old daughter's favourite emoji is 💩.

 

UTF-8

(The following is advanced content not required for this course)

If your text editor supports UTF-8, you might be able to just cut & paste a Unicode character and place it directly in your HTML file. You should make sure your text editor is properly saving in UTF-8.

If you use Unicode in your html this way, then you should add a <meta> tag to your header with a charset attribute indicating the HTML file is in UTF-8.

HTML
<head>
  <meta charset="UTF-8">
 

If you are using a lot of Unicode (for example, writing in a non-english language) then this is usually the best approach.

However, this method is slightly more susceptible to display problems across different computers and web browsers. If you are only using a few Unicode characters it is usually much safer to explicitly add the code with a character entity.