Giter Club home page Giter Club logo

hcaptcha-model-factory's People

Contributors

beiyuouo avatar qin2dim avatar vinyzu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hcaptcha-model-factory's Issues

feat(auto-label): Add example picture to the dataset

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
No

Describe the solution you'd like
A clear and concise description of what you want to happen.
Basically HCaptcha provides 1 example picture of the subject described, we can add it to the training data which will increase the dataset by +1 for each captcha (9 challenge pictures + 1 example)

[bug]

image

this happens when trying to run python main.py --new

Question about training?

Question about training?
How to educate? provided that it would not process the entire set of pictures?
For example:
I had 50k images trained, new 10k were added, how to re-train 10k so as not to process all 60k images?

how to label data

In wiki, it shows as follows:
I think you do not need a label tool for this task... Just drag to the corresponding label folder is enough. It's easy, right?

I think it's hard to understand.Can you explain how to label data in detail?

mistake

when trying to run it returns this error, what can it be?

image

[bug]

Im having this issue

C:\Users\Sky\hcaptcha-model-factory\src>python main.py new
prompt[en] --> sunflower
2022-12-22 21:28:50 | DEBUG - Diagnose task | task_name=sunflower
Use AI to automatically label datasets? {'y', 'n'} --> y
please put all the images in the `unlabel` folder and press any key to continue...
2022-12-22 21:29:02 | INFO - Found 9 images in C:\Users\Sky\hcaptcha-model-factory\data\sunflower\unlabel
2022-12-22 21:29:02 | DEBUG - Extracting embeddings...
C:\Users\Sky\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ..\c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2022-12-22 21:29:03 | INFO - Embeddings extracted
2022-12-22 21:29:03 | INFO - PCA..., shape of embs: (9, 512)
2022-12-22 21:29:03 | INFO - PCA done, shape of embs: (9, 9)
2022-12-22 21:29:03 | DEBUG - Clustering...
2022-12-22 21:29:03 | DEBUG - Clustering done
2022-12-22 21:29:03 | INFO - Saving labels...
2022-12-22 21:29:03 | DEBUG - Labels saved
2022-12-22 21:29:03 | SUCCESS - Auto labeling completed
Start automatic training? {'y', 'n'} --> y
2022-12-22 21:29:04 | DEBUG - Diagnose task | task_name=sunflower
2022-12-22 21:29:04 | ERROR - An error has been caught in function 'new', process 'MainProcess' (16996), thread 'MainThread' (4960):
Traceback (most recent call last):
  File "main.py", line 6, in <module>
    Fire(Scaffold)
  File "C:\Users\Sky\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\Sky\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\Sky\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
> File "C:\Users\Sky\hcaptcha-model-factory\src\apis\scaffold.py", line 78, in new
    Scaffold.train(task=task)
  File "C:\Users\Sky\hcaptcha-model-factory\src\apis\scaffold.py", line 98, in train
    model = Scaffold._model or ResNet(
  File "C:\Users\Sky\hcaptcha-model-factory\src\factories\kernel.py", line 93, in __init__
    self._build_env()
  File "C:\Users\Sky\hcaptcha-model-factory\src\factories\resnet.py", line 64, in _build_env
    raise FileNotFoundError(
FileNotFoundError: The structure of the dataset is incomplete | dir=C:\Users\Sky\hcaptcha-model-factory\data\sunflower

im using python 3.8.1

torchvision==0.9.2
torch==1.8.2

those 2 has to be changed cus it it couldnt find the avaliable version

torchvision==0.10.0
torch==1.9.0

that is what i changed it to

[question] auto label error

Question
im using auto label but i always got value error

i saved image by open(filename, "wb").write(bytes) i think here is error?

Expected behavior


PS D:\python\hcaptcha-model-factory\src> py main.py new 
prompt[en] --> fish_underwater
2022-09-21 19:57:05 | DEBUG - Diagnose task | task_name=fish_underwater
Use AI to automatically label datasets? {'y', 'n'} --> y
please put all the images in the `unlabel` folder and press any key to continue...
2022-09-21 19:57:26 | INFO - Found 7 images in D:\python\hcaptcha-model-factory\data\fish_underwater\unlabel
2022-09-21 19:57:26 | DEBUG - Extracting embeddings...
2022-09-21 19:57:27 | INFO - Embeddings extracted
2022-09-21 19:57:27 | INFO - PCA..., shape of embs: (7, 512)
2022-09-21 19:57:27 | ERROR - An error has been caught in function '_CallAndUpdateTrace', process 'MainProcess' (8452), thread 'MainThread' (21844):
Traceback (most recent call last):
  File "main.py", line 6, in <module>
    Fire(Scaffold)
  File "C:\Users\luanon404\AppData\Local\Programs\Python\Python38\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\luanon404\AppData\Local\Programs\Python\Python38\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
> File "C:\Users\luanon404\AppData\Local\Programs\Python\Python38\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "D:\python\hcaptcha-model-factory\src\apis\scaffold.py", line 70, in new
    ClusterLabeler(data_dir=data_dir).run()
  File "D:\python\hcaptcha-model-factory\src\components\auto_label\cluster.py", line 65, in run
    self.embs = PCA(n_components=self.num_feat).fit_transform(self.embs)
  File "C:\Users\luanon404\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\decomposition\_pca.py", line 407, in fit_transform
    U, S, Vt = self._fit(X)
  File "C:\Users\luanon404\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\decomposition\_pca.py", line 457, in _fit
    return self._fit_full(X, n_components)
  File "C:\Users\luanon404\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\decomposition\_pca.py", line 475, in _fit_full
    raise ValueError(
ValueError: n_components=128 must be between 0 and min(n_samples, n_features)=7 with svd_solver='full'

Desktop (please complete the following information):

  • OS: windows
  • Version: latest

Additional context
cf461699-6add-408a-aa9a-fd568c088c82
ec656c58-afcb-4644-a09c-4a00f8c4f6c2
f13f04dd-7284-4cd8-9d25-609dbc2bd418
fc658142-4c28-4778-ab1e-4979bf283bcd
0d903179-757d-4ef2-98d1-2b0bc25ef85d
4c1633ef-a109-41ab-8f94-b6fd45efff6f
a37fb4c2-930f-45cd-81c7-e7798cc15693

Validation: Split Error

Im getting this error:

Traceback (most recent call last):
  File "C:\Users\admin\Documents\GitHub\hcaptcha-model-factory\src\main.py", line 227, in <module>
    val()
  File "C:\Users\admin\Documents\GitHub\hcaptcha-model-factory\src\main.py", line 159, in val
    data = torchvision.datasets.ImageFolder(
  File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\torchvision\datasets\folder.py", line 310, in __init__
    super().__init__(
  File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\torchvision\datasets\folder.py", line 146, in __init__
    samples = self.make_dataset(self.root, class_to_idx, extensions, is_valid_file)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\torchvision\datasets\folder.py", line 190, in make_dataset
    return make_dataset(directory, class_to_idx, extensions=extensions, is_valid_file=is_valid_file)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\torchvision\datasets\folder.py", line 103, in make_dataset
    raise FileNotFoundError(msg)
FileNotFoundError: Found no valid file for the classes 0, 1. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp

When trying to validate my onnx model.
I tried splitting the data before, using another path, nothing worked.

[feat]

you can put jfif extension in your old project, the one you manually separated as photos in yes/bad, a lot of it, it's very easy to use.

[question]

I wanted to use the old version, there isn't, I think it's simpler

[Question]

Describe the bug
6 legs idk

To Reproduce
Steps to reproduce the behavior:

  1. Use the script like u normally would
  2. What the fuck

Expected behavior
i expected it to magically work

  • OS: [e.g. iOS]
  • Version [e.g. 0.1.7]

da problem

2022-11-17 13:24:38 | INFO - Found 20 images in E:\papka\hcaptcha-model-factory\data\strawberry_cake\unlabel
2022-11-17 13:24:38 | DEBUG - Extracting embeddings...
2022-11-17 13:24:43 | INFO - Embeddings extracted
2022-11-17 13:24:43 | INFO - PCA..., shape of embs: (20, 512)
2022-11-17 13:24:43 | INFO - PCA done, shape of embs: (20, 20)
2022-11-17 13:24:43 | DEBUG - Clustering...
2022-11-17 13:24:44 | DEBUG - Clustering done
2022-11-17 13:24:44 | INFO - Saving labels...
2022-11-17 13:24:44 | DEBUG - Labels saved
2022-11-17 13:24:44 | SUCCESS - Auto labeling completed
Start automatic training? {'y', 'n'} --> y
2022-11-17 13:24:49 | DEBUG - Diagnose task | task_name=strawberry_cake
2022-11-17 13:24:49 | ERROR - An error has been caught in function 'new', process 'MainProcess' (4468), thread 'MainThre
Traceback (most recent call last):
  File "main.py", line 6, in <module>
    Fire(Scaffold)
  File "C:\Users\a\AppData\Local\Programs\Python\Python38\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\a\AppData\Local\Programs\Python\Python38\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\a\AppData\Local\Programs\Python\Python38\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTr
    component = fn(*varargs, **kwargs)
> File "E:\papka\hcaptcha-model-factory\src\apis\scaffold.py", line 78, in new
    Scaffold.train(task=task)
  File "E:\papka\hcaptcha-model-factory\src\apis\scaffold.py", line 98, in train
    model = Scaffold._model or ResNet(
  File "E:\papka\hcaptcha-model-factory\src\factories\kernel.py", line 93, in __init__
    self._build_env()
  File "E:\papka\hcaptcha-model-factory\src\factories\resnet.py", line 64, in _build_env
    raise FileNotFoundError(
FileNotFoundError: The structure of the dataset is incomplete | dir=E:\papka\hcaptcha-model-factory\data\strawberry_cake

Creating model

Hello.
I do not exacly understand how this works.
If I would like to create a model for trucks for example, would I do this?
Put images that contain trucks in the "yes" folder and images that dont have trucks in the "bad" folder
Then run run.bat and wait.
Would this be correct?
Thanks!

adding on to model

I have a few questions i am decent at python but do not have much experience with ai.

  1. How can i train a new model? I see the instructions but it is very hard to understand

  2. How do i add on to the model in hcaptcha-challenger? I want to fix the model they have.

  3. How can i test it? So i know how to use it

[DOCS] ROOKIE FAQ

How to Install Requirements gracefully

👍 Before you start, create a python 3.10+ virtual environment.

  1. Install PyTorch

    You need to download the latest version of torch, torchvision and torchaudio from Start Locally | PyTorch

    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
  2. Download additional dependencies

    pip install -U numpy packaging protobuf onnxruntime opencv-python==4.5.5.62 pillow~=9.2.0 scikit-learn==1.0.1 fire~=0.4.0 loguru~=0.6.0 pyyaml~=6.0

FileNotFoundError: The structure of the dataset is incomplete

You need to get challenge Images from the hCaptcha challenge, and in principle, the more pictures the better. In the first training round, you should put in at least 150 images(yes + bad).

You need to manually program the yes and bad folders and then run the task with the trainval command.

  • ~\hcaptcha-model-factory\data\sunflower\yes
  • ~\hcaptcha-model-factory\data\sunflower\bad
python main.py trainval --task sunflower

problem

Torch not compiled with CUDA enabled

[question] Turn on CUDA

Hello. I can't turn on CUDA. Everything that needed to be installed, it works fine on cpu, but I don’t know how to rearrange it on gpu.

add a new photo extension

most of the photos being downloaded by hcaptcha, are coming in jfif, you could add this extension in your project

feat(pending): motion workflow

Description: quick-train for the hCAPTCHA Animal challenge.
Target: expect to deliver the pluggable model within ten minutes.

doubt

image

this is happening to me when i open the project

[Workflow] Use the trained model to classify unlabeled dataset

# -*- coding: utf-8 -*-
# Time       : 2022/08/19 7:33
# Author     : QIN2DIM
# Github     : https://github.com/QIN2DIM
# Description: Use the trained model to classify unlabeled data sets
import os
import shutil

import cv2
import numpy as np


def load_model(path_model_onnx):
    if (
        not os.path.isfile(path_model_onnx)
        or not path_model_onnx.endswith(".onnx")
        or not os.path.getsize(path_model_onnx)
    ):
        raise RuntimeError
    return cv2.dnn.readNetFromONNX(path_model_onnx)


def classify(net, data):
    img_arr = np.frombuffer(data, np.uint8)
    img = cv2.imdecode(img_arr, flags=1)

    img = cv2.resize(img, (64, 64))
    blob = cv2.dnn.blobFromImage(img, 1 / 255.0, (64, 64), (0, 0, 0), swapRB=True, crop=False)

    net.setInput(blob)
    out = net.forward()
    if not np.argmax(out, axis=1)[0]:
        return True
    return False


def run():
    """
    RecurTraining Motion workflow
    ---------

    bird_flying         # handled label name
     ├── _inner         # recur-output
     │    ├── yes
     │    └── bad
     ├── yes            # labeled dataset for train/val
     ├── bad            # labeled dataset for train/val
     └── *.jpg          # unlabeled dataset

     1. 在一切開始前,你需要手動分類大約 100 張圖片(正反類合計),
        通過正常的 trainval 工作流獲取首個 ONNX 模型;
     2. 當你纍計獲取更多的未標注的圖片後,通過 recur 工作流使用模型進行標注(圖像二分類);
     3. 人工檢查模型輸出,手動校準分類錯誤的極少量圖片,你可以修改標注或刪去圖片;
     4. 合并數據集,將 _inner/yes 以及 _inner/bad 的 recur 輸出合并至已分類的數據目錄;
     5. 使用合并後的數據集再次訓練。

     more: 循環往復,不斷迭代模型。
    """
    # Path to the ONNX model
    model_path = "drawing_of_a_haunted_house.onnx"
    # Path to the unlabeled dataset
    dataset_dir = "drawing_of_a_haunted_house"
    # Path to the recur-output
    output_dir_yes = os.path.join(dataset_dir, "_inner/yes")
    output_dir_bad = os.path.join(dataset_dir, "_inner/bad")
    # Initialize output directory
    os.makedirs(output_dir_yes, exist_ok=True)
    os.makedirs(output_dir_bad, exist_ok=True)

    # 導入上一輪迭代后的模型
    model = load_model(model_path)

    for index, fn in enumerate(img_fns := os.listdir(dataset_dir)):
        # skip nested folders
        img_src = os.path.join(dataset_dir, fn)
        if os.path.isfile(img_src):
            with open(img_src, "rb") as file:
                data = file.read()
            img_dst = os.path.join(output_dir_yes if classify(model, data) else output_dir_bad, fn)
            shutil.move(img_src, img_dst)
        if index % 50 == 0:
            print(f">> recur - progress=[{index}/{len(img_fns)}]")


if __name__ == "__main__":
    run()

How do I help get new captchas solved quicker

Hello, im using your program and it works great!

I want to help you update your program more frequently.
I already installed the hcaptcha-model-factory, but when I want to train it, so I can upload the files for you, it needs more pictures.
Can you give me a step by step guide on how you update your program ?
Maybe a video would be nice so that I can help you get updates quicker !

Thank you!

Predict image with YOLO model " yolov5n6.onnx ''

Dear sir . Can you have me to classification the image with model "yolov5n6.onnx"
I try the other is return the result "0" or "1"
But with this model i don't know witch way to do .
if possible please give me example code .
thanks for support .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.