Giter Club home page Giter Club logo

lc-core's Introduction

LibreCaptcha

LibreCaptcha is a framework that allows developers to create their own CAPTCHAs. The framework defines the API for a CAPTCHA generator and takes care of mundane details such as:

  • An HTTP interface for serving CAPTCHAs
  • Background workers to pre-compute CAPTCHAs and to store them in a database
  • Managing secrets for the CAPTCHAs (tokens, expected answers, etc)
  • Safe re-impressions of CAPTCHA images (by creating unique tokens for every impression)
  • Garbage collection of stale CAPTCHAs
  • Sandboxed plugin architecture (TBD)

Some sample CAPTCHA generators are included in the distribution (see below). We will continue adding more samples to the list. For quick deployments the samples themselves might be sufficient. Projects with more resources might want create their own CAPTCHAs and use the samples as inspiration. See the CAPTCHA creation guide.

Current Status

The framework is stable, but since it is our first public release, we recommend using it only on small to medium scale web apps.

The sample CAPTCHAs are also just that, samples. They have not been tested against bots or CAPTCHA crackers yet.

Quick start with Java

  1. Download the jar file from the latest release
  2. Type mkdir data/. (The data directory is used to store a config file that you can tweak, and for storing the Database)
  3. Type java -jar LibreCaptcha.jar
  4. Open localhost:8888/demo/index.html in browser

We recommend a Java 11+ runtime as that's what we compile the code with.

Alternatively,

  1. Install sbt
  2. Clone this repository
  3. Type sbt run within the repository
  4. Open localhost:8888/demo/index.html in browser

Quick start with Docker

Using docker-compose:

git clone https://github.com/librecaptcha/lc-core.git
docker-compose up

Using docker:

docker run -p=8888:8888 -v ./lcdata:/lc-core/data librecaptcha/lc-core:2.0

A default config.json is automatically created in the mounted volume.

The above commands should work with podman as well, if docker.io registry is pre-configured. Otherwise, you can manually specify the repository like so:

podman run -p=8888:8888 -v ./lcdata:/lc-core/data docker.io/librecaptcha/lc-core:2.0

Quick test

Open localhost:8888/demo/index.html in browser.

Alternatively, on the command line, try:

> $ curl -d '{"media":"image/png","level":"easy","input_type":"text","size":"350x100"}' localhost:8888/v2/captcha
{"id":"3bf928ce-a1e7-4616-b34f-8252d777855d"}

> $ curl "localhost:8888/v1/media?id=3bf928ce-a1e7-4616-b34f-8252d777855d" -o sample.png

> $ file sample.png
sample.png: PNG image data, 350 x 100, 8-bit/color RGB, non-interlaced

The API endpoints are described at the end of this file.

Configuration

If a config.json file is not present in the data/ folder, the app creates one, and this can be modified to customize the app features, such as which CAPTCHAs are enabled and their difficulty settings.

More details can be found in the wiki

Why LibreCaptcha?

Eliminate dependency on a third-party

An open-source CAPTCHA framework will allow anyone to host their own CAPTCHA service and thus avoid dependencies on third-parties.

Respecting user privacy

A self-hosted service prevents user information from leaking to other parties.

More variety of CAPTCHAs

Ain't it boring to identify photos of buses, store-fronts and traffic signals? With LibreCaptcha, developers can create CAPTCHAs that suit their application and audience, with matching themes and looks.

And, the more the variety of CAPTCHAS, the harder it is for bots to crack CAPTCHAs.

Sample CAPTCHAs

These are included in this server.

ShadowText

ShadowText Sample

FilterCaptcha

FilterCaptcha Sample

An image of a random string of alphabets is created. Then a series of image filters that add effects such as Smear, Diffuse, and Ripple are applied to the image to make it less readable.

RainDropsCaptcha

RaindDrops Sample

PoppingCharactersCaptcha

PoppingCharacters Sample

LabelCaptcha

This CAPTCHA provider takes in two sets of images. One with known labels, and the other unknown. The created image has a pair of words one from each set. The user is tested on the known word, and their answer to the unknown word is recorded. If a sufficient number of users agree on their answer to the unknown word, it is transferred to the list of known words.

(There is a known issue with this provider; see issue #68 )


HTTP API

The service can be accessed using a simple HTTP API.

- /v1/captcha: POST

  • Parameters:

    • level: String - The difficulty level of a captcha
      • easy
      • medium
      • hard
    • input_type: String - The type of input option for a captcha
      • text
      • (More to come)
    • media: String - The type of media of a captcha
      • image/png
      • image/gif
      • (More to come)
    • size: String - The dimensions of a captcha. It needs to be a string in the format "widthxheight" in pixels, and will be matched with the allowedSizes config setting. Example: size: "450x200" which requests an image of width 450 and height 200 pixels.
  • Returns:

    • id: String - The uuid of the captcha generated

- /v1/media: GET

  • Parameters:

    • id: String - The uuid of the captcha
  • Returns:

    • image: Array[Byte] - The requested media as bytes

- /v1/answer: POST

  • Parameter:

    • id: String - The uuid of the captcha that needs to be solved
    • answer: String - The answer to the captcha that needs to be validated
  • Returns:

    • result: String - The result after validation/checking of the answer
      • True - If the answer is correct
      • False - If the answer is incorrect
      • Expired - If the time limit to solve the captcha exceeds

Example usage

In javascript:

const resp = await fetch("/v2/captcha", {
    method: 'POST',
    body: JSON.stringify({level: "easy", media: "image/png", "input_type" : "text", size: "350x100"})
})

const respJson = await resp.json();

let captchaId = null;

if (resp.ok) {
    // The CAPTCHA can be displayed using the data in respJson.
    console.log(respJson);
    // Store the id somewhere so that it can be used later for answer verification
    captchaId = respJson.id;
} else {
    console.err(respJson);
}


// When user submits an answer it can be sent to the server for verification thusly:
const resp = await fetch("/v2/answer", {
    method: 'POST',
    body: JSON.stringify({id: captchaId, answer: "user input"})
});
const respJson = await resp.json();
console.log(respJson.result);

Roadmap

Things to do in the future:

  • Sandboxed plugin architecture
  • Audio CAPTCHA samples
  • Interactive CAPTCHA samples

lc-core's People

Contributors

blackmagic0 avatar hrj avatar korkman avatar prajwalgoudar avatar rr83019 avatar sanblig avatar scala-steward avatar vinceh121 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lc-core's Issues

Performance test - Object closed error

This is the most common type error occurring on both /captcha and /answer endpoint

<Response [500]>
<!DOCTYPE html>
<html>
<head><title>500 Internal Server Error</title></head>
<body><h1>500 Internal Server Error</h1>
<p>Error processing request: The object is already closed [90007-197]</p>
</body></html>

Create one text distorted captcha

As discussed, this captcha will use multiple fonts, multiple colors, random rotation and random spacing.

It should accept a folder as input, which contains 3 sub folders: easy, medium, hard which contain fonts of corresponding complexity.

Create a sample audio captcha

We could use a Text-to-speech (TTS) system to generate an audio sample, and store it using a well known format, like Opus, etc.

The stretch goal would be to do it in such a way that voice recognition software can't recognise it. So we might have to add some distortions to the audio.

Improve error responses from API

  • Missing or unparseable parameters should be reported. It is okay if the error mentions just one missing / wrong parameter
  • API Errors should return JSON responses, rather than HTML
  • We should probably use something like OAS to describe the endpoints and the error responses

Performance test - Unique index or primary key violation

This error occurs on /captcha endpoint

<Response [500]>
<!DOCTYPE html>
<html>
<head><title>500 Internal Server Error</title></head>
<body><h1>500 Internal Server Error</h1>
<p>Error processing request: Unique index or primary key violation: &quot;PUBLIC.PRIMARY_KEY_4 ON PUBLIC.MAPID(UUID) VALUES 36797&quot;; SQL statement:
INSERT INTO mapId(uuid, token) VALUES (?, ?) [23505-200]</p>
</body></html>

Check licenses of dependencies

The JLHTTP library being used currently is GPL licensed. Let's try to find a replacement for it.

Also need to check other dependencies!

Support batched creation of Captchas

For providers such as LabelCaptcha, a set of generated Captchas might be only valid for the state of the system at generation time. As the state changes, for example when the set of known and unknown words changes, the older Captchas might become irrelevant.

So, to address this, I propose creating Captchas in batches. For example, one possible design could be:

  • When running low on Captchas, a Captcha provider which has no unsolved captchas is picked for generating more captchas.
  • A NewBatch message is sent to the provider. In its handler, the provider might update its state, for example, reclassify known and uknown words.
  • the CreateChallenge message is then sent N times to this provider, where N is configurable

Create a sample text captcha

An example text challenge could be,

7 is a ___ number? Your Choices: "even", "odd", "prime", "irrational"
(both "prime" and "odd" could be valid answers even though odd is grammatically not right)

or

Write "x73dfg" in reverse

Efficiently create tokens that are unique across restarts

One idea is to have an auto-incrementing field in the database table where the token is inserted. Could be the primary id of the table.

Encrypt and Decrypt this field value on the fly when sending the token to the client. A fast symmetric cipher could be used for the purpose. The key for the cipher would have to be persisted somewhere (perhaps it could be specified in the config file), so that it survives restarts.

Ability to specify config file

It should be possible to specify a different path for the config file from the command line.

Hence, Config should be converted to a class, which takes file path as a constructor parameters.

LabelCaptcha can produce impossible captchas when irrelevant words are maliciously added to the known words list

in lc.captchas.checkAnswer, the first word is a known word and the second word is an unknown word. The known word input must be correct for the captcha to be solved, and if it is, then the unknown word input is added to the unknown answers list. Once an unknown answer has been given 3 times AND is more than 90% of the total unknown answers for the file, it is added to the known word list.

Because the second word is always the unknown word, if a bad actor was to repeatedly request and answer captchas, they may end up solving the same captcha the first three times it is used. In this case, they could give the same irrelevant word as their unknown word input all 3 times (e.g. "hwaxvozaoo"), and it would be accepted as a known word.

In this case, future users would be given the word "hwaxvozaoo" as the expected answer to the known word input, and would not be able to solve the captcha. This captcha would be unsolvable for any user who was shown this captcha.

I can think of a way to mitigate this, but there may be better options for solving this: Record the number of succeeded and failed attempts of each known word per image. Known words with a high number and proportion of failed attempts should be removed from the known words list

Separate code into modules

  • lc-interface: can contain the interfaces that a captcha provider needs to implement
  • lc-utils : can contain common utility functions
  • lc-core: can contain the framework
  • lc-server : can contain the HTTP API

Use OCR in locust tests

We are currently always testing with wrong answers. For proper testing, we need to submit answers that are known to be right or wrong.

For the tests to know what the right answer is, there are two possibilities:

  1. We add a debug flag, which provides a hint in the challenge response. However, this requires us to put compromisable code in production path.
  2. We add a DebugCaptchaProvider, which provides very simple challenges that are easily solved with OCR. This can be disabled by default, and a config.json can be provided by test scripts that enable this provider.

I prefer $2 because it means production code needn't have any hacks in it.

Accessibility

Many traditional CAPTCHAs are difficult for people that have trouble with sight. What is the best CAPTCHA alternative to use with accessibility in mind?

Performance test - Null Pointer Exception

This error occurs on /captcha endpoint

<Response [500]>
<!DOCTYPE html>
<html>
<head><title>500 Internal Server Error</title></head>
<body><h1>500 Internal Server Error</h1>
<p>Error processing request: General error: &quot;java.lang.NullPointerException&quot; [50000-197]</p>
</body></html>

Reduce the solvability of CAPTCHAs using OCR

As reported here, it is currently easy to break the RainDropsCaptcha.

An easy fix for this might be to ensure the background color and foreground color of the text is the same.

Another could be to fuzz the boundary of the characters.

Need to similarly check other sample CAPTCHAs

API: Every challenge should return a list of example challenges

When the API returns a challenge, it can also return a list of example challenges in this form:

examples: [
 {token: "xxx", expectedAnswer: "aaa"},
 {token: "yyy", expectedAnswer: "bbb"},
 ...
]

To enable this,

  • we will need to add a generateExample method to ChallengeProvider interface. A separate method is required because we don't want an opaque secret in this case, but an expected answer.
  • framework will call this method whenever examples are missing for a challenge provider.
  • examples can be stored in the DB
  • we will need to add a getExampleContent HTTP API which accepts an example token

Support for configuration file

On startup, libreCaptcha will load a configuration file, preferrably in JSON or HOCON format.

Example of what it could look like:

{
  randomSeed: 20,
  rateLimitPerUserPerSecond: 100, 
  captchas: {
    "rainDropsCaptcha" : {
      "foregroundColor": "blue",
      "difficulty": "medium, hard"
    },
    "blurCaptcha": {
      "blurRadius": "10px"
    }
 }
}

Initially, this will be a site-wide configuration file. But eventually, every user should be able to make their own captcha configuration.

The captchas field decides which captcha providers get enabled. Also, the captcha specific options will be passed down to the captcha provider.

Spray-JSON is a good choice for a JSON parser. It has no external dependencies.

Wordpress plugin

A simple plugin for wordpress that can show case the framework and API.

It will also serve to help us understand practical requirements.

Don't garbage collect recently served captchas

  • Add a timestamp column to captcha table
  • Update the timestamp to current time whenever a captcha is served
  • In the garbage collecting query, add a clause to exclude captchas that have very recent timestamps

Error in LabelCaptcha during validation

The /answer endpoint has been throwing the following error whenever invoked, after the running the performance tests
Error processing request: key not found: 4874

GC of solved captchas

  • Change solved column to an integer that counts number of times the captcha was solved
  • After every N captchas are created, delete the captchas that have been solved more than M times.
  • After every N*10 captchas are created, delete the captchas that have been solved more than M/2 times
  • Don't serve captchas that have been solved more than M/2 times

N and M need to be configurable, but for now, we can hard-code them to 1000 and 10 respectively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.