ymy-k / hi-sam Goto Github PK
View Code? Open in Web Editor NEW[arXiv preprint] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
License: Apache License 2.0
[arXiv preprint] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
License: Apache License 2.0
Hi,
really exited the train code is now released. What would be the best method to train Hi-SAM for line Seg on Custom Datasets ? My First intution would be to convert the custom dataset into the format of the ones given for line seg. The line Segmentation model was trained on ctw1500 and is not included in the data prep guide.Any Ideas on how to custom train Hi-SAM for lien seg ?
Thanks in Advance
Thanks for your excellent work! When will the training code be released?
Thanks for your splendid work, but I didn't see any description of how to get the output token before S-Decoder and output tokens before H-Decoder. The paper says ''Let ts out ∈ R1×256 denote the inherited output token, which is the first slice of SAM’s output tokens.'' but it's still a little ambiguous.
I tried to run the Hi-SAM Visualization Demo of hierarchical segmentation using the following command
python demo_hisam.py --checkpoint pretrained_checkpoint/hi_sam_l.pth --model-type vit_l --input demo/img293.jpg --output demo/ --hier_det
but got the error
FileNotFoundError: [Errno 2] No such file or directory: 'pretrained_checkpoint/vit_h_maskdecoder.pth'
I did not find any reference to this file in the README. Do you provide this file for download anywhere?
Hi, thanks for sharing your work.
By the way, I found your demo_hisam.py shows poor quality.
I tried to use demo_hisam.py following your direction,
python demo_hisam.py --checkpoint pretrained_checkpoint/hi_sam_l.pth --model-type vit_l --input demo/2e0cb33320757201.jpg --output demo/ --hier_det
This is how i executed the your code and i get the result below.
(Even pretrained weight trained by HierText, which contains input image)
Could you tell me If I made mistake using demo by any chance ?
Thanks for your excellent job, Is the model effective for Chinese text
Hi, first of all, nice job.
I found that quality of the textline detection model is poor. To be more precise, many lines are just not segmented.
To reproduce:
python demo_text_detection.py --checkpoint pretrained_checkpoint/line_detection_ctw1500.pth --model-type vit_h --input demo/1.jpg --output demo/ --dataset ctw1500
Hi-SAM/hi_sam/modeling/prompt_encoder.py
Line 189 in e1ecd86
but, in the code it looks slice the first three output tokens..
Hi-SAM/hi_sam/modeling/mask_decoder.py
Line 297 in e1ecd86
有看到之前的问题里有问过,不过好几个月没有更新,请问后续还会发布训练代码吗?
hi-sam如何修改为多类别分割?
Can you explain more about line2paragraph_index in the dataset?
请问有谁知道哪里有双语文本分割数据集么?万分感激。
这是一项很酷的工作,我一直想强化sam的文字分割能力,但没成功,你们做到了!
我利用hi-sam提取mv中的歌词,虽然仍有一些背景像素的干扰,但总体胜过了之前的方法。
只不过目前的速度有些慢,这应该是sam的锅,目前已有很多的工作提升sam的推理速度,很期待hi-sam能变成faster-hi-sam!
一个推理速度足够快的hi-sam能转变为生产就绪的强大基础组件!
SAM-TSS (only for text stroke segmentation)可以使用Efficient Hi-SAM-S提升性能么,现在测试有些慢?
您好!
首先,很感谢作者能开源代码,方法设计也非常有意思。
我这边想询问一下Hi-SAM现有的预训练模型能否直接应用在中文的文本数据集上吗?论文里的结果都是关于英文数据集,所以我想问问中文方面作者有尝试过做一些实验吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.