smhg / gettext-parser Goto Github PK

View Code? Open in Web Editor NEW

158.0 158.0 44.0 389 KB

Parse and compile gettext po and mo files, nothing more, nothing less

License: MIT License

JavaScript 100.00%

gettext

gettext-parser's People

Contributors

Stargazers

Watchers

gettext-parser's Issues

Line break outputting incorrect format

I am passing the following object to the po compiler

{
    msgctxt: '123',
    msgid: "This is a\r\nTest",
    msgstr: "This is a\r\nTest"
 }

and the result I get back is...

msgctxt "123"
msgid ""
"This is a\\r\\n"
"Test"
msgstr ""
"This is a\\r\\n"
"Test"

I believe the right format returned should be:

msgctxt "123"
msgid "This is a\\r\\nTest"
msgstr "This is a\\r\\nTest"

Is this the intended behavior?

test case failure

Hi,
I am packaging gettext-parser module for Fedora and running the tests as

$ mocha -R spec test/*


  Folding tests
    ✓ Short line, no folding 
    ✓ Short line, force fold with newline 
    ✓ Long line 

  MO Compiler
    UTF-8
      1) should compile
    Latin-13
      2) should compile

  MO Parser
    UTF-8
      ✓ should parse 
    Latin-13
      ✓ should parse 

  PO Compiler
    UTF-8
      ✓ should compile 
    Latin-13
      ✓ should compile 

  PO Parser
    UTF-8
      ✓ should parse 
    UTF-8 as a string
      ✓ should parse 
    Stream input
      ✓ should parse (79ms)
    Latin-13
      ✓ should parse 


  11 passing (231ms)
  2 failing

  1) MO Compiler UTF-8 should compile:

      AssertionError: expected { Object (0, 1, ...) } to deeply equal { Object (0, 1, ...) }
      + expected - actual

         "20": 0,
         "21": 0,
         "22": 0,
         "23": 0,
      +  "24": 76,
      -  "24": 124,
         "25": 0,
         "26": 0,
         "27": 0,
         "28": 0,
         "688": 197,
         "689": 161,
         "690": 0,
         "length": 691,
      +  "offset": 5504,
      -  "offset": 6808,
         "parent": {
           "0": 39,
           "1": 117,
           "2": 115,

      at Assertion.assertEql (/usr/lib/node_modules/chai/lib/chai/core/assertions.js:489:10)
      at Assertion.ctx.(anonymous function) [as eql] (/usr/lib/node_modules/chai/lib/chai/utils/addMethod.js:40:25)
      at Assertion.assertEqual (/usr/lib/node_modules/chai/lib/chai/core/assertions.js:455:19)
      at Assertion.ctx.(anonymous function) [as equal] (/usr/lib/node_modules/chai/lib/chai/utils/addMethod.js:40:25)
      at Context.<anonymous> (/home/parag/rpmbuild/BUILD/gettext-parser-1.1.1/test/mo-compiler-test.js:18:38)
      at callFn (/usr/lib/node_modules/mocha/lib/runnable.js:223:21)
      at Test.Runnable.run (/usr/lib/node_modules/mocha/lib/runnable.js:216:7)
      at Runner.runTest (/usr/lib/node_modules/mocha/lib/runner.js:374:10)
      at /usr/lib/node_modules/mocha/lib/runner.js:452:12
      at next (/usr/lib/node_modules/mocha/lib/runner.js:299:14)
      at /usr/lib/node_modules/mocha/lib/runner.js:309:7
      at next (/usr/lib/node_modules/mocha/lib/runner.js:247:23)
      at Object._onImmediate (/usr/lib/node_modules/mocha/lib/runner.js:276:5)
      at processImmediate [as _immediateCallback] (timers.js:354:15)

  2) MO Compiler Latin-13 should compile:

      AssertionError: expected { Object (0, 1, ...) } to deeply equal { Object (0, 1, ...) }
      + expected - actual

         "20": 0,
         "21": 0,
         "22": 0,
         "23": 0,
      +  "24": 76,
      -  "24": 124,
         "25": 0,
         "26": 0,
         "27": 0,
         "28": 0,
         "694": 254,
         "695": 240,
         "696": 0,
         "length": 697,
      +  "offset": 2408,
      -  "offset": 4544,
         "parent": {
           "0": 123,
           "1": 10,
           "2": 32,

      at Assertion.assertEql (/usr/lib/node_modules/chai/lib/chai/core/assertions.js:489:10)
      at Assertion.ctx.(anonymous function) [as eql] (/usr/lib/node_modules/chai/lib/chai/utils/addMethod.js:40:25)
      at Assertion.assertEqual (/usr/lib/node_modules/chai/lib/chai/core/assertions.js:455:19)
      at Assertion.ctx.(anonymous function) [as equal] (/usr/lib/node_modules/chai/lib/chai/utils/addMethod.js:40:25)
      at Context.<anonymous> (/home/parag/rpmbuild/BUILD/gettext-parser-1.1.1/test/mo-compiler-test.js:27:38)
      at callFn (/usr/lib/node_modules/mocha/lib/runnable.js:223:21)
      at Test.Runnable.run (/usr/lib/node_modules/mocha/lib/runnable.js:216:7)
      at Runner.runTest (/usr/lib/node_modules/mocha/lib/runner.js:374:10)
      at /usr/lib/node_modules/mocha/lib/runner.js:452:12
      at next (/usr/lib/node_modules/mocha/lib/runner.js:299:14)
      at /usr/lib/node_modules/mocha/lib/runner.js:309:7
      at next (/usr/lib/node_modules/mocha/lib/runner.js:247:23)
      at Object._onImmediate (/usr/lib/node_modules/mocha/lib/runner.js:276:5)
      at processImmediate [as _immediateCallback] (timers.js:354:15)



error: Bad exit status from /var/tmp/rpm-tmp.4nq5EC (%check)

Any idea how to fix this?

Wrong hash table offset in mocompiler

From the specification of mo file format
https://www.gnu.org/software/gettext/manual/html_node/MO-Files.html

The hash table offset should be in mocompiler.js - line 192:
// hash table offset
returnBuffer[this._writeFunc](28 + (4 + 4) * list.length * 2, 24);

Please let me know if I was wrong.

Adjust line wrapping algorithm to be closer to the GNU gettext tooling

In our project we use both tools for different parts of our translation pipeline. Unfortunately the inconsistency creates very noisy commits. Bringing them closer together will make reviewing changes made with tools relying on gettext-parser a lot easier to review.

Examples from a diff
Below are examples of a diff. On the left is a .po file created by msginit and manipulated by GNU gettext tools. On the right side is the output of a call to gettext-parser's po.compile function.

The string @count Members are selected is 65 characters long, so it's not wrapped. However the entire line msgid_plural "@count Members are selected" is 80 characters long which seems to cause the GNU tools to wrap the line.

The same happens for

Books have a built-in hierarchical navigation. Use for handbooks or is 77 characters long. The GNU tools seem to allow this with the space being on the first line as the 77th character. gettext-parser will wrap one space earlier.

This seems to occur more often than expected. For example in the following lines as well.

It seems the GNU tools have a bit more knowledge of HTML while the gettext-parser treats href=\"\">[social_mentions:mentioned_user]</a> as an unbreakable string, the GNU break makes more sense because <a href=\"\"> together is more important.

A similar case for this can be found below, where the space is found to be a better breaking point than within a tag.

I also saw the following which actually suggests that the GNU tools allow 77 characters on a line so it may be a better default than 76. I couldn't find the GNU tools' line-break algorithm so I'm not sure whether they special case spaces and dots or just count to 77.

Open Social Branding will be replaced by site name (and slogan if available). is exactly 77 characters.

Tests for big endian mo files

Problem

The package has support for handling both big and little endian byte order, though only little endian is being tested.

Solution

Add test and fixtures for big endian mo files.

An option to disable automatic msgstr folding

Hi @smhg

what do you think about making automatic msgstr folding as an optional feature?

So that we can do something like the following:

po.compile(str, {
    autoFolding: false,
    maxLen: 123
});

Error in PO(T) compilation for plural strings

react-gettext-parser uses this library to compile POT files from JSX and JS sources.
However, if you have a function call to ngettext or other plural translations, it will result in only one empty string in the msgstr array of the translation. gettext-parser will rely on the number of msgstr elements to decide whether it should render msgstr or msgstr[0] instead of relying on the presence of a msgid_plural key. The relevant code is here: https://github.com/andris9/gettext-parser/blob/master/lib/pocompiler.js#L104

For me this results in having a POT file like this:

#: web/static/js/components/react_test.js:14
msgid "Buy envelope"
msgid_plural "Buy envelopes"
msgstr ""

instead of

#: web/static/js/components/react_test.js:14
msgid "Buy envelope"
msgid_plural "Buy envelopes"
msgstr[0] ""

which leads to my gettext tool throwing an error.

This could be fixed in react-gettext-parser but I think it might be more correct for gettext-parser to produce msgstr[0] if a msgid_plural is present and has content.

Module not found: Error: Can't resolve 'stream' in '../node_modules/gettext-parser/lib/poparser.js`

Hi!

This error appeared after updating @angular/cli 1.7 to 6.1.3.

I suggest to add readable-stream to dependencies and use it instead stream which is a mirror of the streams implementations in Node.js.

TypeScript support

It's impossible to use the gettext-parser with TypeScript due to lack of annotations. It would be ace to add them.

msgid duplicates behaviour

If I got duplicate msgid's on my po the po.parse does not throw any exceptions...

For example:

# it.po
msgid "apple"
msgstr "mela"

msgid "apple"
msgstr "melania"

Compiles to:

it.compiled.po
msgid ""
msgstr "Content-Type: text/plain;\n"

msgid "apple"
msgstr "melania"

Keeping only the last msgid...

Error installing gettext-parser

I get this error when installing a grunt plugin that requires gettext-parse:

c:\work\cast-admin\node_modules\po2json\node_modules\gettext-parser\node_modules\iconv>node "C:\Program Files\nodejs\node_modules\npm\bin\node-gyp-bin\\..\..\node_modules\node-gyp\bin\node-gyp.js" rebuild
gyp ERR! configure error
gyp ERR! stack Error: Can't find Python executable "python", you can set the PYTHON env variable.
gyp ERR! stack     at failNoPython (C:\Program Files\nodejs\node_modules\npm\node_modules\node-gyp\lib\configure.js:120:14)
gyp ERR! stack     at C:\Program Files\nodejs\node_modules\npm\node_modules\node-gyp\lib\configure.js:83:11
gyp ERR! stack     at Object.oncomplete (fs.js:107:15)
gyp ERR! System Windows_NT 6.1.7601
gyp ERR! command "node" "C:\\Program Files\\nodejs\\node_modules\\npm\\node_modules\\node-gyp\\bin\\node-gyp.js" "rebuild"
gyp ERR! cwd c:\work\cast-admin\node_modules\po2json\node_modules\gettext-parser\node_modules\iconv
gyp ERR! node -v v0.10.22
gyp ERR! node-gyp -v v0.11.0
gyp ERR! not ok
npm WARN optional dep failed, continuing [email protected]

Do I need to have Python installed on my system for this to work correctly? Or is this what you meant when you write:

If you get a bunchload of warnings or (non fatal) errors when installing, it is ok. These are most probably generated by the optional iconv dependency.

Is there anything I can do to fix it?

multiple references support

For now, to be able to pass multiple references for translation i need to concatenate them with \n symbols.
Would be great if we could pass those references as array. What do you think? I can create PR with this functionality.

Typescript update

Hey everyone,

I wanted to share a project I've been working on – a module designed to handle strings gathered by the parser, which incidentally was also developed by me. The purpose is to consolidate these strings into a cohesive 'set of blocks,' a process facilitated by your module.

I've found your module to be incredibly useful in my workflow, but I have a small suggestion for improvement. I think it would be fantastic if it were implemented in TypeScript, with the types directly provided within the npm module, similar to how it's done in gettext-merger here.

Adding TypeScript support and providing types within the npm package would greatly enhance its usability and compatibility with modern development practices. if some form of support or help is required I am here, I volunteer 😀

Looking forward to your thoughts on this!

ps I read an issue about 'comments', I think he was referring to the comments of the individual translation block and the fact that it is not optional not the fact that he wanted it removed

Silent error doesnt compile my .po

In my application I have some broken .po:

msgid "%s appunti e documenti condivisi da studenti in tutta Italia"
msgstr "%s notas e documentos compartilhados por estudantes de todo o mundo"

msgid "documenti per %s"
msgstr "documentos por %s""

There is a double quote duplication on the last line. Using gettext-parser no error was raised... what should I do? Its a bug? Or should I use a linter? If so there is something out there?

Max length on po strings

I noticed that longer strings not gets parsed when trying to compile my json files to po files.
Found that this package has a limit of 76 chars. Is there any special reason for this?

error when using mo files with angular app compiled with aot

if i'm use mo file i have
main.49a71bb36aed2e8c8fd7.js:1 ERROR TypeError: this._fileContents.readUInt32LE is not a function

but if i'm use po file all works fine.

What kind of 'compile' to use?

I can only produce output (in written file) like:

[object Object]

from an ajax response.

Doing this:

let po = gettextParser.po.parse( response );

and

let po = gettextParser.po.compile( response ); // This fails with:

TypeError: Cannot create property 'headers' on string 'msgid ""

I go about it like this:

xhr('https://localise.biz/api/export/locale/es.po?filter=login&format=script&key=ZnD4KUNjK5uV-fH8CN8rRYlEGGzMak-7S', {
	method: 'GET',
	name: login
}, function ( err, response ) {
	if ( err ) {
		throw err;
	}

	let po = gettextParser.po.parse( response );

	fs.writeFileSync('es_trans.json', po.translations[''], 'utf8', ( err ) => {
		console.log("File error:", err);
	}, 'utf-8');
});

Basically trying to read the response from the server, and write it to a json file.

What am I missing?

msgid >= 77 characters produces incorrect output

var gettextParser = require('gettext-parser');
var data = {
    "translations": {
        "": {
            "dfkjdfkljdfkljdskljdfklsjsdfkljdfsklsdjfklsdfjkldsfjdsflkjsdfkldsfjdskljdsfff": {
                "msgid": "dfkjdfkljdfkljdskljdfklsjsdfkljdfsklsdjfklsdfjkldsfjdsflkjsdfkldsfjdskljdsfff",
                "msgstr": []
            }
        }
    }
};

console.log(gettextParser.po.compile(data).toString());

Results in

msgid ""
msgstr "Content-Type: text/plain;\n"

msgid ""
"dfkjdfkljdfkljdskljdfklsjsdfkljdfsklsdjfklsdfjkldsfjdsflkjsdfkldsfjdskljdsdf"
"kjdfkljdfkljdskljdfklsjsdfkljdfsklsdjfklsdfjkldsfjdsflkjsdfkldsfjdskljdsdfkj"
"dfkljdfkljdskljdfklsjsdfkljdfsklsdjfklsdfjkldsfjdsflkjsdfkldsfjdskljds"
msgstr ""

The issue will not occure with 76 characters.

Doc: It's 'msgctxt' not 'msgctx'

In the documentation (README.md) the gettext message context parameter is incorrectly specified as msgctx. Correct is msgctxt.
Code and test fixtures correctly use msgctxt.

Tests fail on Windows due to EOL

The following tests fail on Windows, very likely due to EOL only differences

  PO Compiler
    Headers
      1) should keep tile casing
    UTF-8
      2) should compile
    Latin-13
      3) should compile
    Plurals
      4) should compile correct plurals in POT files
    Message folding
      5) should compile without folding
      6) should compile with different folding
    Sorting
      7) should sort output entries by msgid when `sort` is `true`
      8) should sort entries using a custom `sort` function

lowercase header transformation destroys custom headers of translation software

When parsing a pot file with this package, all header names are transformed to lowercase first, then upon saving, each letter after a dash in the name is transformed to uppercase.
However, some translation software writes custom headers that are case-sensitive, like Poedit:

"X-Poedit-KeywordsList: "
"__;_e;_n:1,2;_x:1,2c;_ex:1,2c;_nx:4c,1,2;esc_attr__;esc_attr_e;esc_attr_x:1,"
"2c;esc_html__;esc_html_e;esc_html_x:1,2c;_n_noop:1,2;_nx_noop:3c,1,2;__"
"ngettext_noop:1,2\n"
"X-Source-Language: en\n"
"X-Poedit-SearchPath-0: .\n"
"X-Poedit-SearchPathExcluded-0: *.min.js\n"

After handling my pot file with this package, I'm no longer able to update the translation strings from source in Poedit, because they are now written like this:

"X-Poedit-Keywordslist: "
"__;_e;_n:1,2;_x:1,2c;_ex:1,2c;_nx:4c,1,2;esc_attr__;esc_attr_e;esc_attr_x:1,"
"2c;esc_html__;esc_html_e;esc_html_x:1,2c;_n_noop:1,2;_nx_noop:3c,1,2;__"
"ngettext_noop:1,2\n"
"X-Source-Language: en\n"
"X-Poedit-Searchpath-0: .\n"
"X-Poedit-Searchpathexcluded-0: *.min.js\n"

Could you add an option to keep the header names untouched (or it might be a good idea not transforming any custom headers starting with X- by default?)? Or is Poedit simply violating some standards regarding header naming convention in pot files?

Support unescaped double quotes in message strings & IDs

The gettext standard isn't clear on whether double quotes (") should be escaped or not (see: https://stackoverflow.com/questions/10218631/printing-with-gettext).
However, not all translation management systems escape double quotes, so it would be good if gettext-parser could support unescaped "s.
I'll add a PR with a failing test asap and a complete fix eventually.

Backwards compatibility (es6)

Hey, I just wanted to comment to start a discussion on backwards compatibility and what the best approach would be. We recently upgraded to the version that was refactored to ES6 and this broke Internet Explorer because they obviously don't work with it.

Do you think you could ship a compiled version of your library in npm, or do you prefer we compile it on our end? For now we've special cased it:

           include: [
                path.resolve(__dirname, 'static/js'),
                path.resolve(__dirname, 'node_modules/gettext-parser'),
            ],

but I think its more common to ship a web ready version that doesn't require processing

Tolerate additional entries in obsolete messages

I have a .po file whose translations are handled with Poedit and I get obsolete entries in my .po files. The parser fails when it encounters them.

Reproduction

en.po

#, fuzzy
#~| msgid "Latest version"
#~ msgid "Latest Version"
#~ msgstr "Latest version"

Parsing the content of en.po with gettext-parser (running gettextParser.po.parse(content)) produces the following error:

node_modules\gettext-parser\lib\poparser.js:224
            throw err;
            ^

SyntaxError: Error parsing PO data: Invalid key name "|" at line 2. This can be caused by an unescaped quote character in a msgid or msgstr value.

Expected behavior

The parser should not crash. It should either ignore the entry, or properly parse it to data.

Misc

I believe this issue was introduced by the following PR: #64 -- released in version 5.0.0. 4.2.0 is the last version before 5.0.0.

Have a good day !

Version 5.0.0 seems to break Safari

Hello there,

i recently upgraded the package in my Angular application from version 4.2.0 to 5.0.0.
After that the Application did not start on both iOS and mac Safari anymore.

Angular version 13.2.4
Safari Version 15.2.

Browser console only said the following.
SyntaxError: Invalid regular expression: invalid group specifier name

Generated mo file doesn't work

Have problem with generated mo file. I already tested and know:

generated mo can't be converted to po at http://tools.konstruktors.com/
php gettext doesn't work with generated mo file
php gettext works fine if I use generated po and converted to mo at http://tools.konstruktors.com/
mo file is bigger than generated at http://tools.konstruktors.com/
gettextParser.mo.parse gives an error when I try to parse early generated mo

Transifex expects a comment line before headers

I'm using this to generate and submit PO files to Transifex. Transifex verifies the PO files on upload. After some experimenting I found that my uploads were accepted if I prepended #\n to the output of gettextParser.po.compile

It would be nice if there was a way to prepend a comment before the headers without doing it manually.

sorry, my mistake.

unable to read and use with gettext (and/or parser)

Hey i have two files, one does work the other not.

The en_US version works, but the en does not.

EDIT:
my fail

Avoid catastrophic backtracking

The foldLine utility method contains regular expressions which are vulnerable to catastrophic backtracking.

Preferred solution would be to come up with regex which isn't vulnerable as this would be a patch release. But a refactor using String.lastIndexOf() with a smaller list of folding characters (space, dash, dot,...) might be the only valid solution. This would however require it to be a minor release.

Thanks to @davisjam.

Compatible with Browserify?

I'd like to use this library in-browser. I installed node-js, Browserify, the encoding module, and bundled up your code. Everything runs without throwing any errors, but I'm getting strange PO file output. I was expecting a string of text, but I'm getting a string of numbers. Can you recommend a fix?

gettext-parser.js (before bundling with Browserify)

window.poParser = require('./poparser.js');
window.poCompiler = require('./pocompiler.js');
window.moParser = require('./moparser.js');
window.moCompiler = require('./mocompiler.js');

index.html

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>gettext-parser</title>
</head>
<body>
<script src="./gettext-parser.js"></script>
<script>
var example = {
  "charset": "iso-8859-1",

  "headers": {
    "content-type": "text/plain; charset=iso-8859-1",
    "plural-forms": "nplurals=2; plural=(n!=1);"
  },

  "translations": {
    "": {
      "": {
        "msgid": "",
        "msgstr": ["Content-Type: text/plain; charset=iso-8859-1\n..."]
      }
    },
    "another context": {
      "%s example": {
        "msgctx": "another context",
        "msgid": "%s example",
        "msgid_plural": "%s examples",
        "msgstr": ["% näide", "%s näidet"],
        "comments": {
          "translator": "This is regular comment",
          "reference": "/path/to/file:123"
        }
      }
    }
  }
};

var raw = poCompiler(example);
var blob = new Blob(raw, {type: "application/po"});
var objectUrl = URL.createObjectURL(blob);
window.open(objectUrl);
</script>
</body>
</html>

PO File Output:

10911510310510032343410109115103115116114323434103467111110116101110116458412111210158321161011201164711210897105110593299104971141151011166110511511145565653574549921103410348010811711497108457011111410911558321101121081171149710811561505932112108117114971086140110336149415992110341010353284104105115321051153211410110311710897114329911110910910111011610355832471129711610447116111471021051081015849505110109115103105100323437115321011209710911210810134101091151031051009511210811711497108323437115321011209710911210810111534101091151031151161149148933234373211022810510010134101091151031151161149149933234371153211022810510010111634

One thing I noticed is that my poCompiler reference does not contain .po.parse, .po.compile, .mo.parse, or .mo.compile function declarations. It is a bare function:

function (table) {
    var compiler = new Compiler(table);
    return compiler.compile();
}

Fails to recognize charset using a minimal header

When using the header:

msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Content-Type: text/plain; charset=UTF-8\n"

The charset is recognized as:

iso-8859-1

But adding another header, or swapping the two headers results in utf-8 so long as charset is not the final header.

I believe the issue lies in this regex:

gettext-parser/lib/poparser.js

Line 82 in c575146

 if ((match = headers.match(/[; ]charset\s*=\s*([\w\-]+)(?:[\s;]|\\n)*"\s*$/mi))) { 

declaration of control variable of plurals

I'd like to save the name of the control variable for plural forms.
I will amend _parseComments and _drawComments to accept and output
#$ varname
bound to comments.control
This is trivial of course, its a case of whether we want it, and if so what $ and control should be

Different JSON structure

I am surprised the output of the translated strings is strucutred as associative array. This causes problems when you want to save it into MongoDB:

Unhandled rejection MongoError: The dotted field 'Zooming out / moving camera, please wait...' in 'uploaded_files..source_strings.Zooming out / moving camera, please wait...' is not valid for storage.
at Function.MongoError.create (c:\Users\h9pe\Documents\tms-reworked\node_modules\mongodb-core\lib\error.js:31:11)
at toError (c:\Users\h9pe\Documents\tms-reworked\node_modules\mongodb\lib\utils.js:139:22)
at c:\Users\h9pe\Documents\tms-reworked\node_modules\mongodb\lib\collection.js:1060:67
at c:\Users\h9pe\Documents\tms-reworked\node_modules\mongodb-core\lib\connection\pool.js:461:18
at _combinedTickCallback (internal/process/next_tick.js:73:7)
at process._tickCallback (internal/process/next_tick.js:104:9)

What about an option to store it as object array without a key for each object so that one can save the parsed po object into mongodb?

Support of Obsolete Messages

Hello,

have you planned to support obsolete messages in your po parser?

Example of obsolete messages in a po file:

#~ msgid "Obsolete message"
#~ msgstr "Message obsolète"

regards,

Custom validation rules

Hello 👋

To start off I would like to thank you for developing and providing the library to the community!

I have a feature request. As a part of automation process of working with PO files I need to impose validation rules while parsing the files, like ensuring there are no duplicate msgids or no entries with the same value for msgstr. The library already has a parser and a lexer, and I'd like to be able to reuse them for my particular needs. Two approaches come to my mind:

the library exposes abovementioned entities as a part of public API,
the library provides new API that would allow to define custom validation rules.

I'd like to learn you opinion and suggestions. Would you be willing to accept a PR?

compile should add a carriage return to the end of content

this is nitpicky but nice to have -

i'd like the output of gettextParser.po.compile to append a carriage return to its output. This is to reduce the number of lines shown in diffs after content is added. without the carriage return the last line in the previous version is always shown as changed since it didn't have the carriage return.

if there's agreement i'll go ahead and create a PR

Charset get lost and casing of header keys changes

Headers act weird:

        var translation = {
          charset: "UTF-8",
          headers: {
            "Project-Id-Version": "project 1.0.2",
            "Mime-Version": "1.0",
            "Content-Type": "text/plain; charset=UTF-8",
            "Content-Transfer-Encoding": "8bit",
            "Plural-Forms": "nplurals=2; plural=(n!=1);",
            "X-Poedit-SourceCharset": "UTF-8"
          },
          translations: {}
        }

And the generated files outputs:

msgid ""
msgstr ""
"Project-Id-Version: project 1.0.2\n"
"MIME-Version: 1.0\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n!=1);\n"
"X-Poedit-Sourcecharset: UTF-8\n"
"Content-Type: text/plain;\n"

As you can see, Content-Type header lacks the charset and then the X-Poedit-SourceCharset header casing is weird and confuses 3rd party editors, it becomes X-Poedit-Sourcecharset

Can't the headers be sent as they are ? Without any magic title casing ?

Option to remove comments from JSON output

I'd like to get rid of the 'comments' ... how can I do that?

Parse options

Hello,
what about adding parse options like po2json has:

fuzzy Whether to include fuzzy translation in JSON or not. Should be either true or false. Defaults to false.

Support for 'empty' form of plurals

I want to go

msgstr[0] "i dont have any things"
msgstr[1] "i have one thing "
msgstr[2] "i have {{thing_count}} things "

I will assume that if we only have the typical two msgstr, e.g.

msgstr[0] "i have one thing "
msgstr[1] "i have {{thing_count}} things "

that these are the singular and plural forms, and so to use the plural for the empty case (I call it rank)
if I then compile that I will get

msgstr[0] "I have {{thing_count}} things"
msgstr[1] "i have one thing "
msgstr[2] "i have {{thing_count}} things "

This is going to potentially clash with existing translations, but using msgstr[2] for my 'empty' rank is so ugly, where as
0 == empty,
1 == singular
2 == plural
is more natural and descriptive.
In many, I suspect most languages, it's grammatically illegal to say "you have 0 of something". If we are going to bother factoring out the test for 1 or more than 1 and calling the relevant translation, imo we should be doing 0 also.

disabling line folding causes headers to break

If the gettextParser.po.compile method is run with { foldLength : 0 }, the headers will be written out in a single line, for example:

msgid ""
msgstr "Language: de_DE\nContent-Type: text/plain; charset=utf-8\nPlural-Forms: nplurals=2; plural=(n != 1);\n"

The reason is that in pocompiler.js:224, the header block is prepared to be inserted as a normal msgid/msgstr, but if line wrapping is disabled, then it won’t become split up in line 159.

A workaround is to set { foldLength: Number.MAX_SAFE_INTEGER }, but ideally headers shouldn’t be broken when line wrapping is disabled.

UTF-8 issue using readFileSync()

My umlauts are not parsed correctly from my utf8-encoded .po file using the following script

gettextParser.po.parse(
    fs.readFileSync(poFileName)
).translations['']

However, this works:

gettextParser.po.parse(
    fs.readFileSync(poFileName).toString()
).translations['']

Repository Refactoring (proposal)

In order to avoid making a huge PR with generic changes, I'm going to write a tracking issue here that lists the changes I would like to propose, in that way we will have a sort of changes tracking to isolate changes by 'type'.

This issue supersedes #80, which involved a series of structural changes too delicate to be made all at once.

check complete translation

Is there a way to check if all "pot entries" are translated in a specific po/mo ?