You can't make an ePub is anything but, and because I've hard coded the foreign language in, I used UTF-8 for those too. So both are UTF-8, but only the html displays properly.
Rather, I suggest phonetic translation to a latin alphabet so I can get a feel for what it would have sounded like to be listening to what was said.
Sorry, D.S. I never address your phonetic translation idea. Unlike you, I rarely approve of phonetic translations as they never sound authentic to me. I've traveled enough, I could recall what language sounds like, even if I can't translate it.
The bit about 'my sense of professionalism' concerned my fretting about something that readers can't read (and thus can't tell is wrong) anyway, but it bugs me that--after going to that much work to get it right--I can't display the correct text.
If anyone disagrees with typing out phonetic translations let me know and I might reconsider, but I just can't remember many examples in literature where it works well.
Take another look at your text. It's likely those characters are unicode combining accents. That means there are two or more glyphs drawing in the same space. (The way CRT terminals used to draw underlines with "A", backspace, "_".)
Your idea sounded good, but examining it letter by letter, the higher codes are used for the combinations. In fact, a few of them include the trailing/preceding space (since it's a right-to-left language). Again, it seems like the ePub developers instituted a top level character limit which eliminates the use of certain languages. Why they'd do that, I have no clue (other than limiting overhead). They might have decided combining accent marks and spacing was superfluous.