Hi everybody! I'm a complete newbie, so first of all I wanna apologize if I'm posting in the wrong place; cut to the chase:

Issue 1: I have some pdf files containing only pinyin, and tried to ocr them; I've found no difficulty at all with the set of latin alphabet, except for the next set of characters { o ā ɑ̄ ē ī ō ū ǖ Ā Ē Ī Ō Ū Ǖ á ɑ́ é í ó ú ǘ Á É Í Ó Ú Ǘ ǎ ɑ̌ ě ǐ ǒ ǔ ǚ Ǎ Ě Ǐ Ǒ Ǔ Ǚ à ɑ̀ è ì ò ù ǜ À È Ì Ò Ù Ǜ a ɑ e i o u ü A E I O U Ü } which are not recognize by either Abby, Acrobat, Tesseract etc. I've tried to train them, use a combination of different languages, and a million things more like asking in dozens of forums, but no luck.

Issue 2: I also have some resources in true-pdf format with those damn subset of embedded fonts, and when trying "copy-and-paste" activities to rearrange the layout, the file becomes unmanageable because the text get completely illegible. I've installed most of the fonts that I didn't have on my pc, and also the Pitstop plugin, but cannot find a solution -for example, substituting throughout the file all those characters that use a certain embedded subset of a font by a different font, keeping the original character shape.

As you can see issue1 and issue2 are related inasmuch as solving issue1 would also put and end to #2.

So I hope to hear good news soon.
Thanks in advance