codepage

Started by kris99, 2012/06/24, 20:28:09

Previous topic - Next topic

kris99

Hi,

I've got a problem when I extract a rar file which is encoded on a windows machine and file name contains special character like the german äöüÄÖÜß.

In dolphin the characters show like this �. On the command prompt they are shown as a question marks.

Renaming these files with dolphin is not possible. But qmv works.

It must be a problem of the different codepages the os are using. I have tried convmv with the following "from" encoding (but with no luck): ascii, latin1, cp437, cp850, cp1250, cp1252, iso-8859-1, iso-8859-15 and all utf codepages.

Using qmv I got numbers I can't assign the characters to any codepage I know. IIRC windows is using UTF-16LE on there file systems. I use en-US.UTF-8 on my linux system.

Some samples of numbers shown in qmv for this special characters:

nr in qmv hex char
\201  0xC9 ü
\204  0xCC ä
\216  0xD8 Ä
\224  0xE0 ö
\231  0xE7 Ö
\232  0xE8 Ü
\341  0x0155 ß


Any idea how to get rid of this behavior?

dibl

Here's one possible solution (I have not tried it myself):  https://bugs.kde.org/show_bug.cgi?id=269089
System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.

kris99

Thank you for your hint. Unfortunately setting LANG doesn't change anything.

der_bud

You could try the following, without guarantee and not very elegant: create (or use an existing) ntfs-partition and mount it with the option nls=utf8, i.e.
mount -t ntfs-3g -o defaults,nls=utf8 /dev/sdxy /media/foo
Unrar your files in this ntfs-mountpoint, if filenames seem okay try to copy them to your linux-filesystem and watch the umlaute. If filenames are still corrupted after unpacking, remount with changed nls-option, like nls=iso8859-1 * or such.


(*Source: ubuntu-wiki: nls=iso8859-1 als alternativer Zeichensatz, falls es trotz UTF-8 zu Darstellungsfehlern kommt )
Du lachst? Wieso lachst du? Das ist doch oft so, Leute lachen erst und dann sind sie tot.