Giter Club home page Giter Club logo

lc-core's Issues

Create a sample text captcha

An example text challenge could be,

7 is a ___ number? Your Choices: "even", "odd", "prime", "irrational"
(both "prime" and "odd" could be valid answers even though odd is grammatically not right)

or

Write "x73dfg" in reverse

Support for configuration file

On startup, libreCaptcha will load a configuration file, preferrably in JSON or HOCON format.

Example of what it could look like:

{
  randomSeed: 20,
  rateLimitPerUserPerSecond: 100, 
  captchas: {
    "rainDropsCaptcha" : {
      "foregroundColor": "blue",
      "difficulty": "medium, hard"
    },
    "blurCaptcha": {
      "blurRadius": "10px"
    }
 }
}

Initially, this will be a site-wide configuration file. But eventually, every user should be able to make their own captcha configuration.

The captchas field decides which captcha providers get enabled. Also, the captcha specific options will be passed down to the captcha provider.

Spray-JSON is a good choice for a JSON parser. It has no external dependencies.

Use OCR in locust tests

We are currently always testing with wrong answers. For proper testing, we need to submit answers that are known to be right or wrong.

For the tests to know what the right answer is, there are two possibilities:

  1. We add a debug flag, which provides a hint in the challenge response. However, this requires us to put compromisable code in production path.
  2. We add a DebugCaptchaProvider, which provides very simple challenges that are easily solved with OCR. This can be disabled by default, and a config.json can be provided by test scripts that enable this provider.

I prefer $2 because it means production code needn't have any hacks in it.

LabelCaptcha can produce impossible captchas when irrelevant words are maliciously added to the known words list

in lc.captchas.checkAnswer, the first word is a known word and the second word is an unknown word. The known word input must be correct for the captcha to be solved, and if it is, then the unknown word input is added to the unknown answers list. Once an unknown answer has been given 3 times AND is more than 90% of the total unknown answers for the file, it is added to the known word list.

Because the second word is always the unknown word, if a bad actor was to repeatedly request and answer captchas, they may end up solving the same captcha the first three times it is used. In this case, they could give the same irrelevant word as their unknown word input all 3 times (e.g. "hwaxvozaoo"), and it would be accepted as a known word.

In this case, future users would be given the word "hwaxvozaoo" as the expected answer to the known word input, and would not be able to solve the captcha. This captcha would be unsolvable for any user who was shown this captcha.

I can think of a way to mitigate this, but there may be better options for solving this: Record the number of succeeded and failed attempts of each known word per image. Known words with a high number and proportion of failed attempts should be removed from the known words list

Performance test - Object closed error

This is the most common type error occurring on both /captcha and /answer endpoint

<Response [500]>
<!DOCTYPE html>
<html>
<head><title>500 Internal Server Error</title></head>
<body><h1>500 Internal Server Error</h1>
<p>Error processing request: The object is already closed [90007-197]</p>
</body></html>

Improve error responses from API

  • Missing or unparseable parameters should be reported. It is okay if the error mentions just one missing / wrong parameter
  • API Errors should return JSON responses, rather than HTML
  • We should probably use something like OAS to describe the endpoints and the error responses

Error in LabelCaptcha during validation

The /answer endpoint has been throwing the following error whenever invoked, after the running the performance tests
Error processing request: key not found: 4874

Performance test - Null Pointer Exception

This error occurs on /captcha endpoint

<Response [500]>
<!DOCTYPE html>
<html>
<head><title>500 Internal Server Error</title></head>
<body><h1>500 Internal Server Error</h1>
<p>Error processing request: General error: &quot;java.lang.NullPointerException&quot; [50000-197]</p>
</body></html>

Check licenses of dependencies

The JLHTTP library being used currently is GPL licensed. Let's try to find a replacement for it.

Also need to check other dependencies!

Don't garbage collect recently served captchas

  • Add a timestamp column to captcha table
  • Update the timestamp to current time whenever a captcha is served
  • In the garbage collecting query, add a clause to exclude captchas that have very recent timestamps

Accessibility

Many traditional CAPTCHAs are difficult for people that have trouble with sight. What is the best CAPTCHA alternative to use with accessibility in mind?

API: Every challenge should return a list of example challenges

When the API returns a challenge, it can also return a list of example challenges in this form:

examples: [
 {token: "xxx", expectedAnswer: "aaa"},
 {token: "yyy", expectedAnswer: "bbb"},
 ...
]

To enable this,

  • we will need to add a generateExample method to ChallengeProvider interface. A separate method is required because we don't want an opaque secret in this case, but an expected answer.
  • framework will call this method whenever examples are missing for a challenge provider.
  • examples can be stored in the DB
  • we will need to add a getExampleContent HTTP API which accepts an example token

Efficiently create tokens that are unique across restarts

One idea is to have an auto-incrementing field in the database table where the token is inserted. Could be the primary id of the table.

Encrypt and Decrypt this field value on the fly when sending the token to the client. A fast symmetric cipher could be used for the purpose. The key for the cipher would have to be persisted somewhere (perhaps it could be specified in the config file), so that it survives restarts.

Create a sample audio captcha

We could use a Text-to-speech (TTS) system to generate an audio sample, and store it using a well known format, like Opus, etc.

The stretch goal would be to do it in such a way that voice recognition software can't recognise it. So we might have to add some distortions to the audio.

Wordpress plugin

A simple plugin for wordpress that can show case the framework and API.

It will also serve to help us understand practical requirements.

Create one text distorted captcha

As discussed, this captcha will use multiple fonts, multiple colors, random rotation and random spacing.

It should accept a folder as input, which contains 3 sub folders: easy, medium, hard which contain fonts of corresponding complexity.

Ability to specify config file

It should be possible to specify a different path for the config file from the command line.

Hence, Config should be converted to a class, which takes file path as a constructor parameters.

GC of solved captchas

  • Change solved column to an integer that counts number of times the captcha was solved
  • After every N captchas are created, delete the captchas that have been solved more than M times.
  • After every N*10 captchas are created, delete the captchas that have been solved more than M/2 times
  • Don't serve captchas that have been solved more than M/2 times

N and M need to be configurable, but for now, we can hard-code them to 1000 and 10 respectively.

Support batched creation of Captchas

For providers such as LabelCaptcha, a set of generated Captchas might be only valid for the state of the system at generation time. As the state changes, for example when the set of known and unknown words changes, the older Captchas might become irrelevant.

So, to address this, I propose creating Captchas in batches. For example, one possible design could be:

  • When running low on Captchas, a Captcha provider which has no unsolved captchas is picked for generating more captchas.
  • A NewBatch message is sent to the provider. In its handler, the provider might update its state, for example, reclassify known and uknown words.
  • the CreateChallenge message is then sent N times to this provider, where N is configurable

Separate code into modules

  • lc-interface: can contain the interfaces that a captcha provider needs to implement
  • lc-utils : can contain common utility functions
  • lc-core: can contain the framework
  • lc-server : can contain the HTTP API

Reduce the solvability of CAPTCHAs using OCR

As reported here, it is currently easy to break the RainDropsCaptcha.

An easy fix for this might be to ensure the background color and foreground color of the text is the same.

Another could be to fuzz the boundary of the characters.

Need to similarly check other sample CAPTCHAs

Performance test - Unique index or primary key violation

This error occurs on /captcha endpoint

<Response [500]>
<!DOCTYPE html>
<html>
<head><title>500 Internal Server Error</title></head>
<body><h1>500 Internal Server Error</h1>
<p>Error processing request: Unique index or primary key violation: &quot;PUBLIC.PRIMARY_KEY_4 ON PUBLIC.MAPID(UUID) VALUES 36797&quot;; SQL statement:
INSERT INTO mapId(uuid, token) VALUES (?, ?) [23505-200]</p>
</body></html>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.