Giter Club home page Giter Club logo

sqids-spec's Introduction

Sqids Specification

Github Actions

This is the main repository for Sqids specification. It is meant to be the guide for future ports of different languages.

The code is optimized for readability and clarity; individual implementations should optimize for performance as needed.

All unit tests should have matching results.

๐Ÿ‘ฉโ€๐Ÿ’ป Get started

npm install
npm test

The main Sqids library is in src/index.ts; unit tests are in src/tests.

Use the following to format & check changes:

npm run format
npm run lint

๐Ÿšง Improvements (over Hashids)

  1. The user is not required to provide randomized input anymore (there's still support for custom IDs).
  2. Better internal alphabet shuffling function.
  3. With default alphabet - Hashids is using base 49 for encoding-only, whereas Sqids is using base 61.
  4. Safer public IDs, with support for custom blocklist of words.
  5. Separators are no longer limited to characters "c, s, f, h, u, i, t". Instead, it's one rotating separator assigned on the fly.
  6. Simpler & smaller implementation: only "encode" & "decode" functions.

๐Ÿ”ฌ How it works

Sqids is basically a decimal to hexademical conversion, but with a few extra features. The alphabet is larger, it supports encoding several numbers into a single ID, and it makes sure generated IDs are URL-safe (no common profanity).

Here's how encoding works:

  1. An offset index is chosen from the given input
  2. Alphabet is split into two pieces using that offset and those two halfs are swapped
  3. Alphabet is reversed
  4. For each input number:
    1. The first character from the alphabet is reserved to be used as separator
    2. The rest of the alphabet is used to encode the number into an ID
    3. If this is not the last number in the input array, the separator character is appended
    4. The alphabet is shuffled
  5. If the generated ID does not meet the minLength requirement:
    • The separator character is appended
    • If still does not meet requirement:
      • Another shuffle occurs
      • The separator character is again appended to the remaining id + however many characters needed to meet the requirement
  6. If the blocklist function matches the generated ID:
    • offset index is incremented by 1, but never more than the length of the alphabet (in that case throw error)
    • Re-encode (repeat the whole procedure again)

Decoding is the same process but in reverse. A few things worth noting:

  • If two separators are right next to each other within the ID, that's fine - it just means the rest of the ID are junk characters used to satisfy the minLength requirement
  • The decoding function does not check if ID is valid/canonical, because we want blocked IDs to still be decodable (the user can check for this stuff themselves by re-encoding decoded numbers)

๐Ÿ“ฆ How to port Sqids to another language?

Implementations of new languages are more than welcome! To start:

  1. Make sure the language is not already implemented. At this point, if you see a Hashids implementation, but not a Sqids implementation: we could use your help on converting it.
  2. The main spec is here: https://github.com/sqids/sqids-spec/blob/main/src/index.ts. It's ~300 lines of code and heavily commented. Comments are there for clarity, they don't have to exist in your own implementation.
  3. Fork the repository/language you'd like to implement to your own Github account. If the repository/language does not exist under the Sqids Github account, open a new issue under the spec repo so we can create a blank repo first.
  4. Implement the main library + unit tests + Github Actions (if applicable). You do not need to port tests in the internal folder; they are there to test the algorithm itself.
  5. Add a README.md -- you can re-use any of the existing ones.
  6. Please use the blocklist from https://github.com/sqids/sqids-blocklist. It will contain the most up-to-date list. Do not copy and paste the blocklist from other implementations, as they might not be up-to-date.
  7. Create a pull request, so we can review & merge it.
  8. If the repo has no active maintainers, we'll invite you to manage it (and maybe even merge your own PR).
  9. Once the library is ready, we'll update the website.

๐Ÿ“‹ Notes

  • The reason prefix character is used is to randomize sequential inputs (eg: [0, 1], [0, 2], [0, 3]). Without the extra prefix character embedded into the ID, the output would start with the same characters.
  • Internal shuffle function does not use random input. It consistently produces the same output.
  • If new words are blocked (or removed from the blocklist), the encode() function might produce new IDs, but the decode() function would still work for old/blocked IDs, plus new IDs. So, there's more than one ID that can be produced for same numbers.
  • FAQ section is here: https://sqids.org/faq

๐Ÿป License

Every official Sqids library is MIT-licensed.

sqids-spec's People

Contributors

4kimov avatar aradalvand avatar laserpants avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sqids-spec's Issues

Update .NET code snippets on the site

This is probably the last time ๐Ÿ˜…

The return type of the Decode method has changed from int[] to IReadOnlyList<int> (to eliminate unnecessary memory allocation); so the code snippets on the .NET page of the site need to be updated to:

var sqids = new SqidsEncoder<int>();
var id = sqids.Encode(1, 2, 3); // "86Rf07"
var numbers = sqids.Decode(id); // [1, 2, 3]
var sqids = new SqidsEncoder<int>(new()
{
    Alphabet = "FxnXM1kBN6cuhsAvjW3Co7l2RePyY8DwaU04Tzt9fHQrqSVKdpimLGIJOgb5ZE"
});
var id = sqids.Encode(1, 2, 3); // "B4aajs"
var numbers = sqids.Decode(id); // [1, 2, 3]
var sqids = new SqidsEncoder<int>(new()
{
    MinLength = 10
});
var id = sqids.Encode(1, 2, 3); // "86Rf07xd4z"
var numbers = sqids.Decode(id); // [1, 2, 3]

ClojureScript Implementation

I have created an implementation in ClojureScript. If you make a sqids-cljs blank repo I will fork it and create a pull request.

Simpler

Since the initial library has been out for a few weeks, I got to mess around with it a bit more and realized there are a few rough edges we could iron out:

  1. Lower reserved characters count from 3 to 1 (more data packing into generated IDs)
  2. As a side-effect of number one, min alphabet length can decrease from 5 to 3
  3. Decrease blocklist-related re-generation attempts from almost-unlimited to the length of the alphabet (cc @tzvetkoff)
  4. As a result of number three, re-generated IDs don't have to grow in length (try blocking "SrIu" in the playground to see what I'm talking about)
  5. Also as a result of number three, min length padding is smoother now (cc @joviczarko)
  6. Sounds like minValue and maxValue are not that useful, let's remove? (cc @peterhellberg)
  7. Increase min length limit from alphabet-length to an arbitrary value (1_000 for example)

Pros:

  • Less code; spec (with comments, etc) went from 336 to 301
  • Limited (but still sufficient it seems) number of re-generation attempts, as opposed to currently almost-unlimited / less-predictable
  • Smoother blocklist and min length handling
  • Packing more data into IDs + smaller alphabet allowed

Cons:

  • More dev work for maintainers to re-adjust code (you + me)
  • IDs change, which means users that might have current libs in production would have breaking prod changes (although most of our libs are pre-production version (v0.x) ...so comes with the territory)

Unknowns:

  • Are there any use-cases we don't test for since changing reserved chars from 3 to 1?

Current: https://sqids.org/playground (algorithm code / explanation)
Proposed: https://sqids.org/playground/simpler (algorithm code / explanation)


P.S: Also worth noting that I have no intention of changing the algorithm over and over; I figured this might be cleaner + address some current issues/questions, while we're still technically mostly pre-prod.

Thoughts/feedback?

cc @laserpants @vinkla @niieani @antimon2

Why is the algorithm considered cryptographically "not secure"?

I am in the process of implementing a T-SQL port of the algorithm so I have some understanding of the algorithm. I am a bit surprised that the algorithm is considered insecure. It is a semetric encryption algorithm and the keyspace is the factorial of the length of the alphabet. This is definitely sufficiently large.

Of course, it is very much vulnerable to plain text attacks - especially if small consecutive numbers are encoded, like the ids in a master table. The minimum length feature will not mitigate this.

But if the encoded numbers are sufficiently large and the plaintext is not known to the attacker... I don't see any issue. Could you explain why you consider the algorithm insecure?

(I admit the above ifs are pretty big in real life use cases but I was wondering about the algorithm.)

Zig port

Sqids has been ported to Zig here with a minimal test suite.
I will refactor it into a proper package, but it's already functional as is.
Would the Sqids stewardship be willing to create a sqids/sqids-zig repository to host it?

Pascal implementation

I have created an implementation in Pascal. It works with both Free Pascal and Delphi and includes unit tests for both. Could you please make a sqids-pascal repo? I will then fork it and create a pull request.

The incompatibility with Hashids is unacceptable

Wouldn't it have been better to just continue Hashids (the algorithm) just under the new name and organization?!

The so-called improvements that the new algorithm provides don't in the least seem worth breaking backwards-compatibility over, sure, the code might be slightly shorter and better looking, but that's hardly relevant at all to the end user. Backwards-compatibility, however, absolutely is.

The new algorithm also apparently doesn't accept a salt like Hashids, so the user has no choice but to shuffle the the alphabet manually, which is work that they didn't have to do in the previous implementation. How this is considered an "improvement" is beyond me...

There is really no advantage to the new algorithm from the user's POV. Please seriously reconsider this decision.

Yes, the new name is better as it gets rid of the connotations of the word "hash" and the unification of all the implementations in different languages under the same organization is certainly a good thing, but the breaking compatibility with the old algorithm which is arguably the most important thing when it comes to a library like this, is genuinely unacceptable. It makes no sense. And "use it for new projects" is not a good argument, this makes people lose significant trust in this whole project, and that is not trivial, the decision to use Hashids/Sqids is essentially permenant, so trust, reliability, and future-proofness are extremely important.

Don't do this.

Sqids adds too many junk characters when encoding multiple numbers with a min length specified

Hi there, I just noticed something:
When no min length is specified:

const sqids = new Sqids({
    alphabet: '3SO4VRHEP8K0W1NJMCBT6DYA79Q2ZIG5UXFL',
});

console.log(sqids.encode([0, 1000]));
console.log(sqids.encode([0, 1001]));
console.log(sqids.encode([1, 1]));
console.log(sqids.encode([1, 2]));

The result:

P4X75
VIYB0
FHQ5
ZDT5

Now, if you set minLength to 5, this will be the result:

P4X75
VIYB0
L428TX
FS856C

Why did the last two suddenly jump to 6 characters?! That seems unnecessary given that they were just encoded with 4 characters, why does specifying the minLength cause Sqids to add 2 more characters, rather than 1, which would've been sufficient to meet the min-length requirement?

This doesn't actually make sense, correct me if I'm wrong but it seems like some sort of a bug.
It can be especially undesirable since usually the point of setting a min length is aesthetic uniformity, which this particular quirk basically defeats.

Specification version

Would be good to have some kind of versioning for the specification. This would allow implementations to specify which version they are compliant with.

Update .NET code example on the site

Not sure if this is the right place to post this but sqids-dotnet v2 has just been released which added support for all integral numeric types in .NET (e.g. int, long, etc.) โ€” previously we only supported int.
This resulted in an API breaking change where the SqidsEncoder class now requires the user to specify a type argument upon instantiation; so, all three code snippets on the website need to be updated:

-  var sqids = new SqidsEncoder(...
+  var sqids = new SqidsEncoder<int>(...

A few questions

Hi @4kimov, as I was building sqids-dotnet a number of questions popped into my mind which I'd figured I ask here (the first one is the most important one):

1.

It seems like in the new algorithm, any string that contains valid characters (valid meaning characters that are found in the alphabet) will always be decoded into some number; I tried feeding the decode function random strings and it gave me a result every time.

This looks like a crucial point of difference compared to the old Hashids, which was not like this. This means that in Sqids, any given number effectively has multiple valid encoded representations. I'm not sure if this was also technically true of Hashids but in Sqids it's far more extreme and apparent, since basically any string you give it, it always yields a number.

So, my question is: Was this somehow intentional? Or was it merely a side-effect of the simplification of the algorithm?

This can be problematic. In the context of a web app, for instance, where these IDs are typically used as part of the URL to identify a resource, this practically means that multiple different URLs can point to the same resource, which results in URL duplication and is virtually always undesirable. For example โ€” with the default Sqids options:

  • 2fs is decoded as 3168
  • OSc is also decoded as 3168

So, if you have a product, say, with the ID 3168, you'll have more than one URL that points to it:

This is not good. Have you thought this through?

2.

Why does the min-length have to be less than the length of the alphabet? Characters are allowed to repeat in IDs, right? I'm probably missing something, but I'm not sure I understand the logic behind this particular limitation here:

sqids-spec/src/index.ts

Lines 39 to 47 in d33db79

if (
typeof minLength != 'number' ||
minLength < this.minValue() ||
minLength > alphabet.length
) {
throw new TypeError(
`Minimum length has to be between ${this.minValue()} and ${alphabet.length}`
);
}

3.

In the decode function, wouldn't it make sense to add another if clause like the one below that returns an empty array if the input ID contains fewer characters than minLength?

sqids-spec/src/index.ts

Lines 201 to 204 in d33db79

// if an empty string, return an empty array
if (id == '') {
return ret;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.