allenap / rust-petname Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 8.0 380 KB

Generate human readable random names. Rust port of Dustin Kirkland's petname library.

Home Page: https://crates.io/crates/petname

License: Apache License 2.0

Rust 100.00%

rust-petname's People

Contributors

Stargazers

Watchers

Forkers

tiamat-tech vadixidav scsibug tranzystorekk mkroman grahamc nazmulidris dmgolembiowski

rust-petname's Issues

Switch to GitHub Actions

Currently using Travis CI. It works, but I find it confusing. This would also be an opportunity to learn GitHub Actions.

Add --alliterate-with option to specify a specific prefix for alliteration

For example:

$ petname --alliterate-with s --count 10
striking-skink
sweet-skink
super-stork
suitable-sheepdog
singular-sparrow
suitable-shepherd
splendid-stinkbug
secure-shepherd
sought-sawfish
sharing-shiner

Maybe add a -A alias too?

Cheap, non-allocating generator

Currently, initializing a generator causes the parsing and allocation of all the words. Hence it is too expensive for an on-the-fly use e.g. in a webserver where names are generated on client request. So it has to be initialized once for all, but then it uses memory.

Even if you only need one name, you have to process all the words, when you'd only need the total count of words and the map from a word index to its reference.

There could be a generator that uses a static slice of words: no need for runtime parsing nor allocation, and the wordlist is only in static program memory, not in the heap (÷2 memory usage).

Edit: Also space could be saved if words were not stored as a static newline-separated string but as something like (&'static [[u8; 1]], &'static [[u8; 2]], &'static [[u8; 3]], ...) i.e. lists of words indexed by length (even more efficient than [&'static str] because we don't need one reference per word). I hope such a list can be built at compile-time from the newline-separated files.

Upstream petname has -u / --ubuntu option

This generates Ubuntu-style names: alliteration of first character of each word.

I like this, but I'm not sure I want to copy this UX exactly. Maybe an --alliterate flag?

Improve README.md

Move the "Features & no_std support" section down – but call it out in the introduction – because I think it could be off-putting for someone who just wants a CLI or a simple library.
In the introduction, list the ways the crate can be used: CLI, library (as-is, stripped-down, or fully no_std).
Give examples of situations where petname is useful?
Move the code examples into the generated docs, i.e. https://docs.rs/petname/.

Make `Names` non-public, hide behind `impl Trait`

There's no need for Names to be public. It has a cardinality method that is convenient, but it just passes through to Petnames::cardinality, so we could remove it. A kind-of replacement would be to implement Iterator::size_hint.

Ensure that default word lists are unique

The following snippet will find duplicates:

find words -name '*.txt' |
  while read filename; do
    printf '=== %q\n' "$filename" && sort "$filename" | uniq -d
  done

In CI we want to ensure that list is empty.

Upstream petname has -d / --dir option

This allows loading alternative word lists. The target should contain adverbs.txt, adjectives.txt, and names.txt. The default upstream is /usr/share/petname. The default in rust-petname is to use the compiled-in word lists.

Optimise word lists for common operations

This follows on from ideas in #76:

... space could be saved if words were not stored as a static newline-separated string but as something like (&'static [[u8; 1]], &'static [[u8; 2]], &'static [[u8; 3]], ...), i.e. lists of words indexed by length (even more efficient than [&'static str] because we don't need one reference per word). I hope such a list can be built at compile-time from the newline-separated files.

There are at least a couple of common things that the petname command-line tool lets you do that might benefit from preprocessing the word lists before they're compiled in, i.e. alliteration, and word length limits:

    -a, --alliterate                  Generate names where each word begins with the same letter
    -A, --alliterate-with <LETTER>    Generate names where each word begins with the given letter

    -l, --letters <LETTERS>           Maximum number of letters in each word; 0 for unlimited [default: 0]

Separately, I am thinking about changing the -l, --letters <LETTERS> option to take a range, e.g. 3-8. That might have a bearing on how to preprocess the default word lists.

Petnames should provide an Iterator of names

This would a more Rustic way of providing names.

Update README to refer to `structopt` feature

At present it says:

clap enables the clap command-line argument parser.

However, the clap feature is now an alias for structopt.

`petname -a --count N` should produce names with different starting letters

Command: petname --count 1000 -a | cut -c 1 | sort | uniq | wc -l

Produced result: 1

Expected result: anything but 1 pretty much every time

Non-repeating iterator

Neither Petnames::generate nor Names guarantee not to repeat the same name. It should be possible to create a non-repeating iterator by shuffling the word lists then yielding their cartesian product. This is obviously more expensive than choosing words on demand, though not awful I assume if the guarantee of non-repetition is needed. Though, maybe there's a less expensive way? Unless there is a method of doing this with near-zero cost, the default should still be as it is now.

Template strings

Ideas:

petname %j-%v-%n would yield "$adjective-$adverb-$noun", e.g. "fully-select-airedale" (same as petname -w3).
petname 'My %J %N would yield "My $Adjective $Noun", e.g. "My Keen Toucan" (literal text & capitalisation).
...

Upstream petname has -l / --letters option

Limit maximum number of letters in each word, where the default is unlimited. The minimum is 3.

Macro to compile in custom word lists

With the default-words feature, one can compile in the word lists that ships with this crate. There's a certain amount of machinery built around that, but it's not possible to easily compiles in custom word lists – one has to reproduce much of that same machinery. It would be cool if anyone could make use of that via, say, a petnames! macro.

Panics when pipe is closed in streaming mode (--count=0)

For example:

$ petname --count=0 | grep ^e | head -n1
enough-calf
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', src/libstd/io/stdio.rs:805:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Overlap between word lists

For example, the following words are common to both words/medium/names.txt and words/large/names.txt:

bee, buck, bunny, colt, coral, dane, drake, fawn, fisher, jay, joey, kit, ling, loris, mara, marlin, martin, merlin, molly, phoebe, phoenix, raven, ray, rhea, robin, shad, wren, zander

This is a problem because Petnames::default concatenates word lists of all sizes without deduplication, so these common words are more likely to be chosen than others.

This is not a problem upstream because only one word list (small, medium, or large) is used at any time. This is selected by the --complexity option at the command-line, so fixing #6 may eliminate this issue.

`petname` integration tests did not spot buffering issue

I think at some point during the v2 alpha/beta period I wrapped stdout with a buffered writer. It was never being explicitly flushed, but its Drop impl does that so it should have worked. However, I was creating the buffer in main and keeping it in scope until the end of the function – and since I was calling std::process::exit directly in main, the buffer's Drop impl was not called before the process exited.

The bug is: I didn't notice this, and released 2.0.0 with this issue!

The integration tests currently in this project call run directly, i.e. they missed the logic in main that caused the bug. An integration test that invokes petname from the outside could have prevented this.

`-V, --version` arguments no longer recognised in v1.1.3

Both -V and --version worked in v1.1.2.

Rename `Petnames`'s `names` field to `nouns`

The term "names" is used to mean different things. To reduce confusion, and to be consistent with Petnames's other fields, its names field should become nouns.

Generate multiple names at a time from the command-line

Maybe you want a 1000 names upon which you'll then apply another external filter. Or maybe even stream names out continuously, with back-pressure coming from pipe buffers.

Upstream petname has -c / --complexity option

where it can be one of [0, 1, 2]; 0 = easy words, 1 = standard words, 2 = complex words.

This switches between the small (0), medium (1), and large (2) word sets. Values other than 0, 1, or 2 use the default word set, which is small.

Not sure I want to copy this UX exactly. Maybe it's more interesting to have --small, --medium, --large flags, or a --word-set={small,medium,large} option.

Make --stream an alias for --count=0, then remove --count=0 as a special case

`petname --non-repeating` applies randomness "unevenly"

When running petname --non-repeating, for simplicity of implementation the word lists are shuffled only once at the start, then iterated through like a counter. For example, supposing we have 2 word lists:

a, b, c
x, y, z

The output of petname --non-repeating --stream might be:

c-y
a-y
b-y
c-z
a-z
b-z
c-x
a-x
b-x

Note that for the second word, we see all the ys, then al the zs, then the xs. It could be in any order, but they'll always be grouped. Note also that for the first word the order in which we see them is repeated.

It's easy to observe this:

$ petname --non-repeating --alliterate-with k --count 30
known-koala
key-koala
keen-koala
kind-koala
knowing-koala
known-killdeer
key-killdeer
keen-killdeer
kind-killdeer
knowing-killdeer
known-kid
key-kid
keen-kid
kind-kid
knowing-kid
known-kite
key-kite
keen-kite
kind-kite
knowing-kite
known-kiwi
key-kiwi
keen-kiwi
kind-kiwi
knowing-kiwi
known-kitten
key-kitten
keen-kitten
kind-kitten
knowing-kitten

My expectation is that there would be no obvious patterns like this.

Limit docs.rs generation to a single platform

rust-petname's documentation is the same on all platforms, so there's no need for docs.rs to build the crate on every Tier 1 platform. This blog post explains how to limit the platform, and why.

Support for platforms that getrandom does not support

Currently, it isn't possible to use this library on a platform I am using. The reason is that petname seems to depend on getrandom which has unsupported targets (https://docs.rs/getrandom/0.2.1/getrandom/#unsupported-targets).

I DO have an implementation of Rng on my platform from the rand crate, which petname supports, but since it also depends on getrandom, I am unable to use petname (without forking it). Would you be willing to accept a PR to introduce features to petname so that when you use default-features = false that it does not pull in getrandom?

Relevant error:

error: target is not supported, for more information see: https://docs.rs/getrandom/#unsupported-targets
   --> /home/worleyg/.cargo/registry/src/github.com-1ecc6299db9ec823/getrandom-0.2.1/src/lib.rs:214:9
    |
214 | /         compile_error!("target is not supported, for more information see: \
215 | |                         https://docs.rs/getrandom/#unsupported-targets");
    | |_________________________________________________________________________^