Giter Club home page Giter Club logo

tanet's Introduction

License Framework

[国内的小伙伴请看更详细的中文说明]This repo contains the official implementation and the new IAA dataset TAD66K of the IJCAI 2022 paper. Our new work on ICCV2023:Link

Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks

Shuai He, Yongchang Zhang, Rui Xie, Dongxiang Jiang, Anlong Ming

Beijing University of Posts and Telecommunications


TAD66K  

Introduction

  • We build a large-scale dataset called the Theme and Aesthetics Dataset with 66K images (TAD66K), which is specifically designed for IAA. Specifically, (1) it is a theme-oriented dataset containing 66K images covering 47 popular themes. All images were carefully selected by hand based on the theme. (2) In addition to common aesthetic criteria, we provide 47 criteria for the 47 themes. Images of each theme are annotated independently, and each image contains at least 1200 effective annotations (so far the richest annotations). These high-quality annotations could help to provide deeper insight into the performance of models.

TAD66K

example3

Download Dataset

  • Download from here google, it contains images with the largest side scaled to 800, and labels categorized by different themes.
  • or here baidu, code: 8888

TANet  

Introduction

We propose a baseline model, called the Theme and Aesthetics Network (TANet), which can maintain a constant perception of aesthetics to effectively deal with the problem of attention dispersion. Moreover, TANet can adaptively learn the rules for predicting aesthetics according to a recognized theme. By comparing 17 methods with TANet on three representative datasets: AVA, FLICKR-AES and the proposed TAD66K, TANet achieves state-of-the-art performance on all three datasets. TANet Performance

Environment Installation

  • pandas==0.22.0
  • nni==1.8
  • requests==2.18.4
  • torchvision==0.8.2+cu101
  • numpy==1.13.3
  • scipy==0.19.1
  • tqdm==4.43.0
  • torch==1.7.1+cu101
  • scikit_learn==1.0.2
  • tensorboardX==2.5

How to Run the Code

  • We used the hyperparameter tuning tool nni, maybe you should know how to use this tool first (it will only take a few minutes of your time), because our training and testing will be in this tool.
  • Train or test, please run: nnictl create --config config.yml -p 8999
  • The Web UI urls are: http://127.0.0.1:8999 or http://172.17.0.3:8999
  • Note: nni is not necessary, if you don't want to use this tool, just make simple modifications to our code, such as changing param_group['lr'] to param_group.lr, etc.
  • PS: The work of train on the FLICKR-AES dataset may not be made public, because we are currently cooperating with a company, and the relevant model has been embedded into the system, and there are some confidentiality requirements.

If you find our work is useful, pleaes cite our paper:

@article{herethinking,
  title={Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks},
  author={He, Shuai and Zhang, Yongchang and Xie, Rui and Jiang, Dongxiang and Ming, Anlong},
  journal={IJCAI},
  year={2022},
}

Try!

TANet.real-time.inference.video.1.mp4
TANet.real-time.inference.video.2.mp4
TANet.real-time.inference.video.3.mp4

tanet's People

Contributors

mrobotit avatar woshidandan avatar zstbupt avatar zyber404 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tanet's Issues

关于nni训练nni.report_intermediate_result在trial中只输出一次信息的问题

想问问大佬们在训练TAD666K数据集时是否遇到过题目中出现的情况

我遇到的情况大致如下:

  1. 在nni的WebUI中每次trial只输出epoch=0的中间实验结果,但是在log文件中可以看到多个epoch的train与validate的进度条。

  2. 如果不使用nni进行训练,则在运行框中输出结果正常。

此外还想问一下作者,保存模型的代码是否需要自己写,以及作者训练出来的优秀指标大概是多少。

关于nni训练

您好,我尝试使用nni训练,输入nnictl create --config config.yml -p 8999后,出现错误: ERROR: "config.yml" is not a valid file. 期待您的答复

Which hyperparameter should be set by individually

Hi, I have successfully run the code, which is exactly the same as yours.

(base) PS C:\Users\96502> cd E:\Project\HeterogeneousComputing\TANet-main\TANet-main\code\TAD66K
(base) PS E:\Project\HeterogeneousComputing\TANet-main\TANet-main\code\TAD66K> conda activate TANet_py39
(TANet_py39) PS E:\Project\HeterogeneousComputing\TANet-main\TANet-main\code\TAD66K> nnictl create --config config.yml -p 8999
INFO:  expand searchSpacePath: search_space.json to E:\Project\HeterogeneousComputing\TANet-main\TANet-main\code\TAD66K\search_space.json
INFO:  expand codeDir: . to E:\Project\HeterogeneousComputing\TANet-main\TANet-main\code\TAD66K\.
INFO:  Starting restful server...
INFO:  Successfully started Restful server!
INFO:  Setting local config...
INFO:  Successfully set local config!
INFO:  Starting experiment...
INFO:  Successfully started experiment!

The experiment id is Jo44BgAs
The Web UI urls are: http://169.254.52.43:8999   http://169.254.179.254:8999   http://169.254.150.240:8999   http://10.32.94.167:8999   http://169.254.37.46:8999   http://169.254.73.179:8999   http://172.25.32.1:8999   http://127.0.0.1:8999
------------------------------------------------------------------------------------
You can use these commands to get more information about the experiment
------------------------------------------------------------------------------------
         commands                       description
1. nnictl experiment show        show the information of experiments
2. nnictl trial ls               list all of trial jobs
3. nnictl top                    monitor the status of running experiments
4. nnictl log stderr             show stderr log content
5. nnictl log stdout             show stdout log content
6. nnictl stop                   stop an experiment
7. nnictl trial kill             kill a trial job by id
8. nnictl --help                 get help information about nnictl
------------------------------------------------------------------------------------
Command reference document https://nni.readthedocs.io/en/latest/Tutorial/Nnictl.html
------------------------------------------------------------------------------------

But I don't know why experiments in visual interfaces always fail.

Z}9N36$9ONCSRMR_6O~3PHV

Have you ever encountered such a problem?
And here are configurations in my case:

’config.yml‘

authorName: default
experimentName: HyperNet_NNI_Come_on
trialConcurrency: 1
maxExecDuration: 1000h
maxTrialNum: 200
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
  #SMAC (SMAC should be installed through nnictl)
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
trial:
  command: python train_nni.py
  codeDir: .
  gpuNum: 1
localConfig:
  useActiveGpu: true

The modifications in ‘option.py’

def init():
    parser = argparse.ArgumentParser(description="PyTorch")
    parser.add_argument('--path_to_images', type=str, default='E:\\dataset\\TAD66K_jmg',
                        help='directory to images')
    parser.add_argument('--path_to_save_csv', type=str,default="./dataset/dataset/merge/",
                        help='directory to csv_folder')
    parser.add_argument('--experiment_dir_name', type=str, default='.',
                        help='directory to project')

Sincerely hope to receive your reply!

python的版本

Hi, 我在配置实验环境的时候使用了python3.5 3.6 3.7 三个版本都无法成功配置,想问问大佬原始版本是哪个

关于模型代码

你好,模型代码里有两处地方不是很懂,可以帮忙看一下吗?

  1. class TargetNet 里的 forward 函数
    def forward(self, x, paras):

        q = self.fc1(x)
        # print(q.shape)
        q = self.bn1(q)
        q = self.relu1(q)
        q = self.drop1(q) 

        self.lin = nn.Sequential(TargetFC(paras['res_last_out_w'], paras['res_last_out_b']))
        q = self.lin(q)
        q = self.softmax(q)
        return q

其中 res_last_out_w 的 shape 是 [batch_size, 100], res_last_out_b 是 [batch_size, 1],self.lin 的输入 tensor 的 shape 是 [batch_size, 100],这样 self.lin 的输出 tensor 的 shape 为 [batch_size, batch_size],是一个 shape 与 batch_size 相关的 tensor,这样如果 batch_size 为 1 的话,这个函数输出的 tensor 的 shape 就固定为 [1, 1],值也就固定为 1,这样等于主题网络部分输出一个固定为 1 的值,应该是有点问题?

  1. Attention 函数里
def Attention(x):
    batch_size, in_channels, h, w = x.size()
    quary = x.view(batch_size, in_channels, -1)
    key = quary
    quary = quary.permute(0, 2, 1)

    sim_map = torch.matmul(quary, key)

    ql2 = torch.norm(quary, dim=2, keepdim=True)
    kl2 = torch.norm(key, dim=1, keepdim=True)
    sim_map = torch.div(sim_map, torch.matmul(ql2, kl2).clamp(min=1e-8))

    return sim_map

这里的实现跟论文里说的似乎不一样?这里的做法应该是 value 的 similarity_map 除以归一化值的 similarity_map,而非论文里说的常规 attention 去掉 V。

载入预训练好的权重的推理结果

直接载入预训练好的权重在AVA和TAD66K上推理,指标似乎并不能达到模型上所写的,想问问原因是什么?
如AVA上直接载入权重的结果为lcc:0.764 srcc:0.755,TAD上为lcc:0.526 srcc:0.507

不同batch size时inference结果不同

#3 中提到的问题1会导致分类的支路输出的结果与batch size有关,当batch size是不同时,inference结果也不同,并且bs=1时,分类的支路输出固定为1;

请问怎么解决不同batch size时inference结果不同的问题呢?
在端侧使用的时应该bs=1的情况很多,但此时分类的分支输出是固定为1的,就起不到作用了吧?

Some questions about reproducibility

Dear author,

As mentioned in the supplementary materials, the hyperparameters were set as follows:

To train our TANet, we used Microsoft's neural network intelligence (NNI) tuning tool, where the learning rate search space from L1 to L6 was set as [0.000001, 0.0000001, 0.0000003], without any decay rate strategy. Specifically, we set the input size to 40 and the number of training epochs to 200. We used 224 × 224 crops from 256 × 256 fixed images as input.

However, achieving the desired training results in a single step with this particular combination of hyperparameters has proven to be challenging due to the absence of a decay rate strategy. I wonder if you performed multiple training steps, manually adjusting the learning rate to obtain the best results. For example, training for a certain number of epochs with one set of hyperparameters and then building upon that by training further, ensuring the total number of training epochs reaches 200, rather than directly training for 200 epochs using a specific set of hyperparameters. If this is the case, how can the ablation experiment be performed to ensure the accuracy of the experiment? Additionally, the provision of random seeds was not mentioned.

I would greatly appreciate your insights and guidance on this matter.

Warm regards,

Vanessa

Hyperparameters to replicate the paper's results

Hi, thank you for your great work!

I was wondering if you could share the hyperparameters you ended up searching for and using so that we can quickly reproduce the results of the paper. It is certainly possible to search for hyperparameters using nni, but I think it would be time consuming and pointless process.

Thanks again!

Results interpretation

Hello, thanks for the work and for providing the code.

I ran the eval() on batch of 8 images an here's the result:

([[0.0145, 0.0029, 0.0645, 0.2702, 0.4544, 0.1338, 0.0399, 0.0154, 0.0011,
0.0032],
[0.0290, 0.0171, 0.0896, 0.2024, 0.2972, 0.1970, 0.0950, 0.0444, 0.0119,
0.0164],
[0.0121, 0.0162, 0.0500, 0.1337, 0.2570, 0.2802, 0.1702, 0.0536, 0.0113,
0.0157],
[0.0291, 0.0355, 0.0795, 0.1375, 0.1992, 0.2212, 0.1488, 0.0780, 0.0376,
0.0337],
[0.0518, 0.0196, 0.1209, 0.2296, 0.2699, 0.1419, 0.0686, 0.0596, 0.0187,
0.0192],
[0.0348, 0.0399, 0.0847, 0.1438, 0.1954, 0.2086, 0.1374, 0.0795, 0.0407,
0.0353],
[0.0304, 0.0156, 0.0889, 0.2146, 0.3070, 0.1885, 0.0862, 0.0429, 0.0105,
0.0153],
[0.0561, 0.0247, 0.1231, 0.2177, 0.2498, 0.1417, 0.0734, 0.0669, 0.0242,
0.0224]])

Could you explan better how to interpretate the scores ?

Error on single image inference, thanks

I have tried to use the model to do aesthetics assessment on a single image, and below is my code snippet:

`
class IAA:
def init(self, model_file, weights_path):
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.init_model(model_file, weights_path)
normalize = transforms.Normalize(mean=IMAGE_NET_MEAN, std=IMAGE_NET_STD)
self.transform = transforms.Compose([
transforms.ToTensor(),
normalize
])

def init_model(self, model_file, weights_path):
    self.model = TANet(model_file)
    self.model.load_state_dict(torch.load(weights_path))
    self.model.to(self.device)
    self.model.eval()

def inference(self, image_path):
    with open(image_path, 'rb') as f:
        image = Image.open(f)
        image = image.convert('RGB')
        image = image.resize((224, 224))
        image = self.transform(image)
        image = torch.unsqueeze(image, dim=0)
        image = image.to(self.device)
        iia_score = self.model(image)
    return iia_score

if name == 'main':
weights_path = r'.\weights\SRCC_758_LCC_765.pth'
model_file = r'.\weights\resnet18_places365.pth.tar'
image_path = r'.\samples\00000.png'
iaa = IAA(model_file, weights_path)
score = iaa.inference(image_path)

`

However, I got an error: ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 1]).

It seems this is a reported error when using batchnorm on a single image (batch size =1), see here (https://discuss.pytorch.org/t/error-expected-more-than-1-value-per-channel-when-training/26274/67). However, it was suggested that adding model.eval() can solve the problem. As you can see, I used model.eval(), but still got the error.

Did you happen to encounter to this error? If so, how did you solve it? Thanks a lot.

demo的运行代码在哪里

我在这个git上只看到了展示的视频打分效果,没有看到有模型前向推理的demo代码,有人复现出这个效果了吗

数据集失效

你好,请问可以重新分享一下数据集链接吗~~失效啦

请问我该如何产生您展示的视频demo

您好,我是一位大三的学生,我们在完成一个期末大作业的时候认为您的评分体系对于我们视频的自动裁剪很有帮助,但是由于我个人刚接触这一块工作,请问该怎么样才能像您在Readme展示的视频一样,我如果只是跑了nnictl create --config config.yml -p 8999这个命令,貌似只是用nni在训练是吗。请问我该如何产生一个这样的视频呢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.