Giter Club home page Giter Club logo

Comments (22)

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

Hi,

Maybe we can use this code:

https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.h
https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.cpp

Thanks!

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

Hi,

I didn't get valid utf8 back, although with the following code everything seems ok here!

diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp
index b9bbbb8..3135c46 100644
--- a/library/src/backend/charsets.cpp
+++ b/library/src/backend/charsets.cpp
@@ -69,6 +69,24 @@ static const unsigned short ebuLatinToUcs2[] = {
 /* 0xf8 - 0xff */ 0xfe,   0x014b, 0x0155, 0x0107, 0x015b, 0x017a, 0x0167, 0xff
 };

+static const char* utf8_encoded_EBU_Latin[] = {
+"\0", "Ę", "Į", "Ų", "Ă", "Ė", "Ď", "Ș", "Ț", "Ċ", "\n","\v","Ġ", "Ĺ", "Ż", "Ń",
+"ą", "ę", "į", "ų", "ă", "ė", "ď", "ș", "ț", "ċ", "Ň", "Ě", "ġ", "ĺ", "ż", "\u0082",
+" ", "!", "\"","#", "ł", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/",
+"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?",
+"@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",
+"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "Ů", "]", "Ł", "_",
+"Ą", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
+"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "«", "ů", "»", "Ľ", "Ħ",
+"á", "à", "é", "è", "í", "ì", "ó", "ò", "ú", "ù", "Ñ", "Ç", "Ş", "ß", "¡", "Ÿ",
+"â", "ä", "ê", "ë", "î", "ï", "ô", "ö", "û", "ü", "ñ", "ç", "ş", "ğ", "ı", "ÿ",
+"Ķ", "Ņ", "©", "Ģ", "Ğ", "ě", "ň", "ő", "Ő", "€", "£", "$", "Ā", "Ē", "Ī", "Ū",
+"ķ", "ņ", "Ļ", "ģ", "ļ", "İ", "ń", "ű", "Ű", "¿", "ľ", "°", "ā", "ē", "ī", "ū",
+"Á", "À", "É", "È", "Í", "Ì", "Ó", "Ò", "Ú", "Ù", "Ř", "Č", "Š", "Ž", "Ð", "Ŀ",
+"Â", "Ä", "Ê", "Ë", "Î", "Ï", "Ô", "Ö", "Û", "Ü", "ř", "č", "š", "ž", "đ", "ŀ",
+"Ã", "Å", "Æ", "Œ", "ŷ", "Ý", "Õ", "Ø", "Þ", "Ŋ", "Ŕ", "Ć", "Ś", "Ź", "Ť", "ð",
+"ã", "å", "æ", "œ", "ŵ", "ý", "õ", "ø", "þ", "ŋ", "ŕ", "ć", "ś", "ź", "ť", "ħ"};
+
 std::string toStringUsingCharset (const char* buffer,
                                  CharacterSet charset, int size) {
 std::string  s;
@@ -91,11 +109,8 @@ uint16_t i;
           case EbuLatin:
           default:
              for (i = 0; i < length; i++)
-                if (buffer [i] & 0x80) {
-                   uint8_t c0 =  (0xc0 | (((uint8_t)buffer [i]) >> 6));
-                   uint8_t c1 =  ((buffer [i] & 0x3f) | 0x80);
-                   s. push_back (c0);
-                   s. push_back (c1);
+                if (buffer [i] & 0xff) {
+                    s. append (utf8_encoded_EBU_Latin[buffer[i] & 0xff]);
                 }
                 else
                    s. push_back (buffer [i]);
$ dab-raw-3 -F 20171226_092958_12B.iq
dab_cmdline V 1.0alfa,
 	                  Copyright 2017 J van Katwijk, Lazy Chair Computing
opt = F
ofdm word gestart
Period = 8000
End of file, restarting
there might be a DAB signal here

no ensemble data found, fatal
BRF 1            (6366) is part of the ensemble
La 1ère Wallonie (6351) is part of the ensemble
TPEG_PACKET      (data) (E0606361) is part of the ensemble
End of file, restarting
Classic 21       (6354) is part of the ensemble
ensemble RTBF DAB         is (6005) recognized
Test Musiq3 +    (6358) is part of the ensemble
BRF 2            (6367) is part of the ensemble
TARMAC           (6357) is part of the ensemble
Pure             (6355) is part of the ensemble
Musiq3           (6353) is part of the ensemble
VivaCité         (6052) is part of the ensemble
Test Classic 21+ (6356) is part of the ensemble
La 1ère BXL      (6951) is part of the ensemble
End of file, restarting
^C

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

Just a note, characters 1 to 127 will be appended directly to string. Although utf8_encoded_EBU_Latin doesn't match the "ascii/iso" ones.

eg "\x01" matches to "Ę" when using EBU_Latin. But "\x01" on ascii translates to ^A (SOH).

So the following still required if I am not mistaken.

diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp
index dcf5221..f030357 100644
--- a/library/src/backend/charsets.cpp
+++ b/library/src/backend/charsets.cpp
@@ -110,11 +110,8 @@ uint16_t i;
           case EbuLatin:
           default:
              for (i = 0; i < length; i++)
-                if (buffer [i] & 0x80) {
-                   if (buffer [i] & 0xff) {
+                if (buffer [i] & 0xff)
                       s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
-                   }
-                }
                 else
                    s. push_back (buffer [i]);
        }

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

I am sorry once again, but this will work only for positions >= 128.

What about "Ę", (0x01), "Į", (0x02) etc?

It seems that EBU Latin character defines those positions, differently than normal ascii control chars.

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

Then this table is wrong? https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.cpp#L38

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

It seems ok according to: ETSI TS 101 756 v1.8.1. (page 41)

https://worlddabeureka.org/2015/08/03/issue-26-new-latin-based-character-set-for-dab/

http://www.etsi.org/deliver/etsi_ts/101700_101799/101756/01.08.01_60/ts_101756v010801p.pdf

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

I think we should handle EBU Latin separately from ISO Latin and add utf16to8 (eg from utfcpp).

0000	Complete EBU Latin based repertoire - see annex C
0100	ISO Latin Alphabet No. 1 (see ISO/IEC 8859-1 [8]) 
0110	ISO/IEC 10646 [26] using UCS-2 transformation format, big endian byte order 
1111	ISO/IEC 10646 [26] using UTF-8 transformation format

Most probably today most people still use Latin1 :)

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

Hi,

I think the following will be fine, until somebody uses UCS2 encoding.

diff --git a/library/includes/backend/charsets.h b/library/includes/backend/charsets.h
index 4851443..399b481 100644
--- a/library/includes/backend/charsets.h
+++ b/library/includes/backend/charsets.h
@@ -33,8 +33,9 @@
  */
 typedef enum {
     EbuLatin   = 0x00, // Complete EBU Latin based repertoire - see annex C
-    UnicodeUcs2 = 0x06,
-    UnicodeUtf8 = 0x0F
+    IsoLatin    = 0x04, // ISO Latin Alphabet No. 1 (see ISO/IEC 8859-1 [8])
+    UnicodeUcs2 = 0x06, // ISO/IEC 10646 [26] using UCS-2 transformation format, big endian byte order
+    UnicodeUtf8 = 0x0F  // ISO/IEC 10646 [26] using UTF-8 transformation format
 } CharacterSet;

 /**
diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp
index cd8d6db..202421e 100644
--- a/library/src/backend/charsets.cpp
+++ b/library/src/backend/charsets.cpp
@@ -100,21 +100,20 @@ uint16_t i;
           length = size;

        switch (charset) {
-//        case UnicodeUcs2:
-//           s = std::string::fromUtf16 ((const ushort*) buffer, length);
-//           break;
+          case EbuLatin:
+              for (i = 0; i < length; i++)
+                 s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
+              break;

-          case UnicodeUtf8:
-             break;
+           case UnicodeUcs2:
+              throw std::logic_error("UnicodeUcs2 to Utf8 not yet implemented")
+              break;

-          case EbuLatin:
+          case IsoLatin:
+          case UnicodeUtf8:
           default:
-             for (i = 0; i < length; i++)
-                if (buffer [i] & 0x80) {       // extended char
-                   s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
-                }
-                else
-                   s. push_back (buffer [i]);
+              for (i = 0; i < length; i++)
+                s. push_back (buffer [i]);
        }

        return s;

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

Great!

I create a PR to solve few typos after latest merge.

from dab-cmdline.

JvanKatwijk avatar JvanKatwijk commented on August 17, 2024

from dab-cmdline.

athoik avatar athoik commented on August 17, 2024

I guess we are done here, in case a broadcast with UCS-2 appeared we need a UCS2 to UTF8 function ;)

from dab-cmdline.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.