Giter Club home page Giter Club logo

aconv's Introduction

aconv

Converts texts from the auto-detected encoding to UTF-8 or a specified encoding.
This is similar to the iconv command but differences are following.

  • Detects encoding if the source encoding is not specified.
  • Replaces malformed byte sequences with the REPLACEMENT CHARACTER or the corresponding numeric character reference, which depends on the destination charset (i.e. Unicode or not).
  • Can recursively convert files in directories and output converted files to the specified directory preserving the directory hierarchy.

Since this library depends on encoding_rs, available encodings are the ones defined in the Encoding Standard.

Note: UTF-16 files are needed to have a BOM to be detected as the encoding.
This is because chardetng, on which this library depends, does not support UTF-16 and this library only added BOM sniffing to detect UTF-16.

Installation

cargo install aconv

Usage

Converts texts from the auto-detected encoding to UTF-8 or a specified encoding.
If byte sequences that is malformed as Unicode are found,
they are replaced with the REPLACEMENT CHARACTER(U+FFFD).
If the destination encoding is not Unicode and unmappable characters are found, they are
replaced with the corresponding numeric character references.
If the encoding detection is considered it failed, the input texts are output as-is,
meaning no conversion takes place, and an error message is emitted.

USAGE:
    aconv [FLAGS] [OPTIONS] [FILE]...

FLAGS:
    -h, --help       Prints help information
    -l, --list       Prints supported encodings
    -q, --quiet      Suppresses error messages when encoding detection failed
    -s, --show       Only shows auto-detected encodings without decoded texts
    -V, --version    Prints version information

OPTIONS:
    -o, --output <DIRECTORY>                 Output directory. If input arguments contain directories, the directory
                                             hierarchies are preserved under DIRECTORY
    -t, --to-code <ENCODING>                 The encoding of the output [default: UTF-8]
    -A, --non_ascii_to_guess <NUMBER>        The number of non-ASCII characters to guess the encoding. Around 100
                                             characters are enough for most cases, but if the guess is not accurate,
                                             increasing the value might help [default: 100]
    -T, --non-text-threshold <PERCENTAGE>    The threshold (0-100) of non-text character occurrence. Above this
                                             threshold in decoded UTF-8 texts, the encoding detection is treated as it
                                             failed. In that case the input texts are output as-is with an error message
                                             emitted [default: 0]

ARGS:
    <FILE>...    Files (or directories) to process

How encoding detection works

See transcoding_rs.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

aconv's People

Contributors

kena0ki avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.