Giter Club home page Giter Club logo

fashion's Introduction

Recommendation Engine

The primary objectives of this recommendation engine are twofold. Firstly, it aims to develop an experimental recommendation engine utilizing the pgvector extension and PL/Python. Secondly, it seeks to construct the same engine using the aidb extension, with the intent of demonstrating the extension's ease of implementation and its capability to abstract complexities without compromising functionality.

Catalog Using Postgresql & PGvector

The objective of this experiment is to leverage the CLIP model in conjunction with PostgreSQL, employing the pgvector extension and PL/Python to execute transformation functions directly within the database for efficient searching. This setup involves a dataset of 44k Images of catalog.

Instead of storing the images directly in the database, we store only their full file paths. The actual data stored in the database consists of image embeddings, which are generated by the CLIP model and encapsulated in 512-dimensional vectors as required by the model. This approach enables rapid search capabilities on a standard laptop.

We are showing also text to image search, searching on catalog passing text as input.

Sample Dataset

Download and unzip the dataset from https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small/download?datasetVersionNumber=1 into a folder like following dataset/images

Install all dependencies for transformers and validate that CLIP model is available

Postgresql 16 installed.

If aidb will be used;

aidb extension is installed. If not installed please install it by following the step by step installation guide: https://www.enterprisedb.com/docs/edb-postgres-ai/ai-ml/install-tech-preview/

If aidb won't be used but pgvector and PL/Python will be used;

EDB Language pack installed. Install pgvector 0.6 extension from https://github.com/pgvector/pgvector Validate that pl-python3u is working well

run select public.test_plpython() inside the database;

postgres=# select public.test_plpython(); test_plpython
PL/Python is working! (1 row)

After above installations install requirements;

Run pip install from EDB Python directory as: pip install -r requirements.txt

Python Environment: The Python environment accessible to PostgreSQL should have the necessary libraries installed:

Run

Run with aidb extension

Initially run connect_encode.py file to install aidb extension, create and refresh retriever to collect and generate embeddings from image datas in an S3 bucket. The images should be stored into that S3 bucket to run the python script. Then you should pass the name of the S3 bucket name as an argument like in below;

%python code/connect_encode.py s3-bucket-name

Similarity Search using Streamlit application Catalog Search and Free Text Search on Catalog.

Change the db connection with the necessary port, username, password from create_db_connection function and DATABASE_URL variable.

To run with aidb use the below code

%streamlit run code/app_search_aidb.py

Example search texts : red shoes, red women shoes, black shoes....

Run without aidb (pgvector and PL/Python only)

Create products_emb, install Pl_python3u functions

Open psql and create the table

drop table products_emb;

CREATE TABLE products_emb (
    Id integer,
    gender VARCHAR(50),
    masterCategory VARCHAR(100),
    subCategory VARCHAR(100),
    articleType VARCHAR(100),
    baseColour VARCHAR(50),
    Season text,
    year INTEGER,
    usage text null,
    productDisplayName TEXT null,
    Image_path text null, 
    embedding vector(512)
);

Install the functions inside DDL folder:

1 - load_fashion_tag -- this function will read the products table and insert inside the new products_emb and add 2 columns embedding and image_path

postgres=# select load_fashion_tag('dataset/images','product', 32);

NOTICE:  Processed 32 images in 1.172111988067627 seconds. rows inserted 32

NOTICE:  Processed 32 images in 0.810783863067627 seconds. rows inserted 64

NOTICE:  Processed 32 images in 0.8137722015380859 seconds. rows inserted 96

NOTICE:  Processed 32 images in 0.9017457962036133 seconds. rows inserted 128

NOTICE:  Processed 32 images in 0.8105340003967285 seconds. rows inserted 160

NOTICE:  Processed 32 images in 0.8043057918548584 seconds. rows inserted 192
...

NOTICE:  Processed 32 images in 0.786837100982666 seconds. rows inserted 44411


NOTICE:  Processed 32 images in 0.6174778938293457 seconds. rows inserted 44435

NOTICE:  Total Rows: 44435 Total function execution time: 1283.9920008182526 seconds. Model loading time: 2.249537944793701 seconds. Fetching time: 0.05452418327331543 seconds.

Generate Embedding

from psql run the following with the path as final path where you unzip all images

postgres=# select load_fashion_tag('/Users/francksidi/Downloads/archive/images','product', 32);

Similarity Search using Streamlit application Catalog Search and Free Text Search on Catalog.

Change the connection info inside. Run from the command line. Copy the logo.png image in the directory in which the python program is running. For instance look for : red shoes, red women shoes, black shoes.... the application is inside code directory.

%streamlit run app_search_adv.py

Similarity Search using Streamlit application Catalog Search and Free Text Search on Catalog and Search on Similar Images

Change the connection info inside. Run from the command line. Copy the logo.png image in the directory in which the python program is running. For instance look for : red shoes, red women shoes, black shoes.... the application is inside code directory. upload a similar image and search

%streamlit run app_search_final.py

fashion's People

Contributors

francksidi-edb avatar bilge-ince avatar sergioenterprisedb avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.