MSR Speech Technology Home Page Microsoft Speech API 4.0

Unicode Values for IPA Characters

The following table lists groups of IPA characters and the Unicode blocks in which they can be found. The U+ prefix is a convention that identifies Unicode; they are 16-bit hexadecimal values.

IPA Characters Unicode block
Standard Latin U+0041 -- U+00FF
European and Extended Latin U+0010 -- U+01F0
Standard phonetic characters U+0250 -- U+02AF
Modifier letters (spacing) U+02B0 -- U+02FF
Diacritical marks (nonspacing) U+0300 -- U+036F

The symbols used for American English phonemes are listed below. Each phoneme symbol is accompanied by an example, as well as the IPA description, the Unicode name for the glyph shape used in the IPA standard phonetic charts, and the Unicode value. Some phonemic labels are described as diphthongs or affricate clusters. For these, it may be preferable to rely on the MS labels, rather than the Unicode clusters of their component phonemes, since some TTS engines will provide single combined data points for these phonemes, rather than synthesize them as combinations of separately modeled phonemes. In the Unicode names, 'LATIN' means 'LATIN SMALL LETTER' and 'GREEK' means 'GREEK SMALL LETTER'.

MS Example IPA Description Unicode name Unicode
iy feel, eve, me front close unrounded LATIN I U+0069
ih fill, hit, lid front close unrounded (lax) LATIN CAPITAL I U+026A
ae at, carry, gas front open unrounded (tense) LATIN AE U+00E6
aa father, ah, car back open unrounded LATIN ALPHA U+0251
ah cut, bud, up open-mid back unrounded LATIN TURNED V U+028C
ao dog, lawn, caught open-mid back round LATIN OPEN O U+0254
ay tie, ice, bite diphthong with quality: aa + ih
ax ago, comply central close mid (schwa) LATIN SCHWA U+0259
ey ate, day, tape front close-mid unrounded (tense) LATIN E U+0065
eh pet, berry, ten front open-mid unrounded LATIN OPEN E U+025B
er turn, fur, meter central open-mid unrounded rhoticized LATIN SCHWA W/HOOK U+025A
ow go, own, tone back close-mid rounded LATIN O U+006F
aw foul, how, our diphthong with quality: aa + uh
oy toy, coin, oil diphthong with quality: ao + ih
uh book, pull, good back close-mid unrounded (lax) LATIN UPSILON U+028A
uw tool, crew, moo back close round LATIN U U+0075
b big, able, tab voiced bilabial plosive LATIN B U+0062
p put, open, tap voiceless bilabial plosive LATIN P U+0070
d dig, idea, wad voiced alveolar plosive LATIN D U+0064
t talk, sat voiceless alveolar plosive & LATIN T U+0074
meter alveolar flap LATIN R W/FISHHOOK U+027E
g gut, angle, tag voiced velar plosive LATIN SCRIPT G U+0067
k cut, oaken, take voiceless velar plosive LATIN K U+006B
f fork, after, if voiceless labiodental fricative LATIN F U+0066
v vat, over, have voiced labiodental fricative LATIN V U+0076
s sit, cast, toss voiceless alveolar fricative LATIN S U+0073
z zap, lazy, haze voiced alveolar fricative LATIN Z U+007A
th thin, nothing, truth voiceless dental fricative GREEK THETA U+03B8
dh then, father, scythe voiced dental fricative LATIN ETH U+00F0
sh she, cushion, wash voiceless postalveolar fricative LATIN ESH U+0283
zh genre, azure voiced postalveolar fricative LATIN EZH U+0292
l lid alveolar lateral approximant LATIN L U+006C
elbow, sail velar lateral approximant LATIN L W/MIDDLE TILDE U+026B
r red, part, far retroflex approximant LATIN R U+0279
y yacht, onion, yard palatal sonorant glide LATIN J U+006A
w with, away labiovelar sonorant glide LATIN W U+0077
hh help, ahead, hotel voiceless glottal fricative LATIN H U+0068
m mat, amid, aim bilabial nasal LATIN M U+006D
n no, end, pan alveolar nasal LATIN N U+006E
nx sing, anger, drink velar nasal LATIN ENG U+014B
ch chin, archer, march voiceless alveolar affricate: t + sh U+02A7
jh joy, agile, edge voiced alveolar affricate: d + zh U+02a4

The following symbols can be used to construct phoneme strings and phonetic input to a TTS engine.

The precise effects may vary in different TTS engines.

MS Description Unicode name Unicode Usage/Effect
- syllable boundary HYPHEN-MINUS U+002D separates syllables
# word boundary NUMBER SIGN U+0023 separates words
(space) word boundary SPACE U+0020 separates words
_ silence UNDERLINE U+005f indicates silent period
1 primary stress MODIFIER LETTER VERTICAL LINE U+02C8 precedes affected vowel
2 secondary stress MODIFIER LETTER LOW VERTICAL LINE U+02CC precedes affected vowel
(blank) word boundary SPACE U+0020 separates words
. period FULL STOP U+002E pitch fall, pause
? question mark QUESTION MARK U+003F pitch rise, pause
! exclamation EXCLAMATION MARK U+0021 raised pitch range, pause
, comma COMMA U+002C continuation rise, pause

Use the Prn control tag to indicate how to pronounce text by passing the phonetic equivalent to the engine. For information about Prn, see "Text-to-Speech Control Tags."

Note these rules:

For more information   about IPA characters and Unicode, see the following publications:

© 1995-1998 Microsoft Corporation. All rights reserved.