Giter Club home page Giter Club logo

hi-sam's People

Contributors

ymy-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hi-sam's Issues

Polygon3 error

I am getting this error while building the polygon3 wheel. I am running python 3.8 on colab. Can someone help?

error

[Question] Line Segmentation Training on Custom Dataset

Hi,

really exited the train code is now released. What would be the best method to train Hi-SAM for line Seg on Custom Datasets ? My First intution would be to convert the custom dataset into the format of the ones given for line seg. The line Segmentation model was trained on ctw1500 and is not included in the data prep guide.Any Ideas on how to custom train Hi-SAM for lien seg ?

Thanks in Advance

  • V

How is output token in Fig3 obtained?

Thanks for your splendid work, but I didn't see any description of how to get the output token before S-Decoder and output tokens before H-Decoder. The paper says ''Let ts out ∈ R1×256 denote the inherited output token, which is the first slice of SAM’s output tokens.'' but it's still a little ambiguous.

File "vit_h_maskdecoder.pth" is missing

I tried to run the Hi-SAM Visualization Demo of hierarchical segmentation using the following command

python demo_hisam.py --checkpoint pretrained_checkpoint/hi_sam_l.pth --model-type vit_l --input demo/img293.jpg --output demo/ --hier_det

but got the error
FileNotFoundError: [Errno 2] No such file or directory: 'pretrained_checkpoint/vit_h_maskdecoder.pth'

I did not find any reference to this file in the README. Do you provide this file for download anywhere?

Poor Paragraph segmentation quality

Hi, thanks for sharing your work.

By the way, I found your demo_hisam.py shows poor quality.
I tried to use demo_hisam.py following your direction,

python demo_hisam.py --checkpoint pretrained_checkpoint/hi_sam_l.pth --model-type vit_l --input demo/2e0cb33320757201.jpg --output demo/ --hier_det

This is how i executed the your code and i get the result below.
(Even pretrained weight trained by HierText, which contains input image)
2e0cb33320757201

Could you tell me If I made mistake using demo by any chance ?

Poor textline detection quality.

Hi, first of all, nice job.
I found that quality of the textline detection model is poor. To be more precise, many lines are just not segmented.

To reproduce:
python demo_text_detection.py --checkpoint pretrained_checkpoint/line_detection_ctw1500.pth --model-type vit_h --input demo/1.jpg --output demo/ --dataset ctw1500

Images:
1
2

Results:
1
2

Question!!

  1. coords are not in [0,1]^2 square.. just coords/img_size(=1024) for normalization(0~1). it is ok?

coords = 2 * coords - 1

  1. in the paper (section 3.5)
    "After the final token-to-image
    attention, we slice the last three output tokens and get
    tˆh out ∈ R
    K×3×256."

but, in the code it looks slice the first three output tokens..

iou_token_out = hs[:, 0, :] # (1, 256)

  1. Why padding needed in prompt encoding??

是否有性能优化的打算?

这是一项很酷的工作,我一直想强化sam的文字分割能力,但没成功,你们做到了!

我利用hi-sam提取mv中的歌词,虽然仍有一些背景像素的干扰,但总体胜过了之前的方法。

只不过目前的速度有些慢,这应该是sam的锅,目前已有很多的工作提升sam的推理速度,很期待hi-sam能变成faster-hi-sam!

一个推理速度足够快的hi-sam能转变为生产就绪的强大基础组件!

请问这个方法在中文数据集上的效果如何?

您好!
首先,很感谢作者能开源代码,方法设计也非常有意思。
我这边想询问一下Hi-SAM现有的预训练模型能否直接应用在中文的文本数据集上吗?论文里的结果都是关于英文数据集,所以我想问问中文方面作者有尝试过做一些实验吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.