Giter Club home page Giter Club logo

t2i-urban's Introduction

Overview

This project is my bachelor's thesis on the topic: "Generation of images of urban landscapes in high resolution from a textual description using deep learning methods."

For a text that takes place in a city, it generates images of the corresponding urban location in 1024x1024 resolution.

The service works as follows:

  1. The service accepts a request containing text in Russian. The text is translated into English.
  2. Next, a text is sent to the third party service, which predicts its location using NLP methods. It is assumed that location is urban.
  3. Images corresponding to the text description of the obtained location are loaded from free photo stocks (Depositphotos, Unsplash).
  4. The uploaded images are segmented into classes corresponding to the objects of the urban environment (road, sidewalk, building, etc.). For this, SegFormer is used.
  5. For each obtained segmentation, several images are generated using OASIS.
  6. The most appropriate to textual description generated images are selected using CLIP score.
  7. Finally, x2 super resolution is performed on each image using Real-ESRGAN.

Installation

Download checkpoints

Segmentation checkpoints

Download weights for SegFormer model from here and unpack it into segmentation/.

Generation checkpoints

Download weights for OASIS model from here and unpack it into generation/OASIS/checkpoints.

Docker build

docker build -t t2i-urban .
docker run -d -p 8080:8080 --name=[container-name] t2i-urban

Install packages

docker start [container-name]
docker attach [container-name]
pip install -v -e .

Usage

Send request

Running docker container sets up a server (on localhost:8080) that accepts requests in the following format:

  • method: get-images
  • Content-Type: application/json
  • request format: {"text": string}
  • answer format: {"result": list<base64-string>} (list of base64-encoded images)

Manually

If you want to generate images manually run main.py. Usage:

python main.py --text TEXT [--save-dir DIR] [--samples SAMPLES] [--from-text]

--text TEXT         - input text
--save-dir DIR      - results will be saved here (default is './results')
--samples SAMPLES   - number of images to be generated (default is 10)
--from-text         - with this option enabled images are generated directly from the text 
                      instead of generating from text location
                      (thus text must contain a description of the urban location)

Important:

Third party service, which predicts text's location can be unavailable. In this case use --from-text option to generate images directly from input text. Thus text must contain a description of the urban location.

t2i-urban's People

Contributors

annie4ka99 avatar

Watchers

James Cloos avatar  avatar

Forkers

valeryefimova

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.