Giter Club home page Giter Club logo

Comments (23)

pjb7687 avatar pjb7687 commented on May 18, 2024

Firstly the first line of the input file should be location of fasta file, not a pattern.
Please edit your input file and try again.

And the fasta file should start with '>' for each sequence!

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Hi, sorry for the confusion. I actually have the correct input files, I was only providing the info.
Here is an example:

cas-offinder> cat t.in
bg
NNNNNNNNNNNNNNNNNNNNNNN
GGCCGACCTGTCGCTGACGCAGG 5
cas-offinder> ls -lt bg
total 0
-rwxrwxrwx 1 owner Users 28 Apr 13 09:08 test.fa
cas-offinder> cat bg/test.fa

s1
GGCCGACCTGTCGCTGACGCAGG
cas-offinder> ./cas-offinder t.in C out
Total 1 device(s) found.
Loading input file...
Reading bg/test.fa...
Sending data to devices...
Chunk load started.
1 devices selected to analyze...
Finding pattern in chunk #1...
Comparing patterns in chunk #1...

Segmentation fault

If we prefix "N" to the s1 sequence, it works correctly.

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Some other suggestions: (1) cas-offinder currently can only take a directory name as the first input line, it would be really great if it can use a sequence file name instead. As we keep the hg38.fa file in a folder that contains many other sequence files. If I made a folder and create a symbolic link that points to hg38.fa, cas-offinder does not seem to be able to follow a symbolic link. (2) it will be great if we can specify the number of CPU/GPU to be used. We have a shared GPU cluster, if some GPUs are already in use, it will be great if we can tell cas-finder not to use all GPU devices. Thank you for your kind consideration!

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

Thank you so much for your bug report and suggestions.

  1. I fixed the bug!
  2. Now Cas-OFFinder correctly follows symlinks. In addition, if a chromosome file is specified instead of the directory then it will be used for analysis.
  3. By specifying an integer number after C/G/A (e.g. cas-offinder input.txt G10 output.txt), user can limit the maximum number of devices for analysis. If user don't specify any number (as previous version), then Cas-OFFinder will try to use maximum number of GPUs installed on system.

You can checkout and pull the latest version of develop branch for the above changes. They will be applied to master branch & released public after some testing.

Best,
Jeongbin

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Thank you so much for implementing all these requests so quickly!!
There seems to be some typos in the development branch:
(1) It does not compile well under Linux
read_fasta.cpp: input.open(buf); # buf should be path_buf?
read_twobit.cpp: input = fopen(buf, "rb"); # same, buf should be path_buf
Also, the follow does not compile. We probably need to define 'FILE *input' first, outside the ifdef/else
Then change FILE *input=... into input=...

#ifdef _WIN32
FILE *input = fopen(filepath.c_str(), "rb");
#else
... if (path_cnt >= 0)
FILE *input = fopen(path_buf, "rb");
else
FILE *input = fopen(filepath.c_str(), "rb");
(2) I made these changes and tested it, it does not seem to take a number after "G"
./cas_dev t.in G4 ooo
clCreateContext Failed: -33

(3) Since you are so open to suggestions, I have one last one. I try to use cas-offinder to implement what Broad group did in their 2015 paper. Since they allow up to 5 mismatches, cas-offinder will generate tons of output (for the many sgRNAs, I will be looking at GBs' output). So I plan to modify cas-offinder, so that it can dump output to stdout, then pipe the stdout to Broad's cfd-score-calculation and only keep the lines that has significant off-target scores. E.g., I will run:
cas-offinder input G - | cdf-score-calculation > cas_out.txt
So when output file name is "-", we can write to stdout, I guess you probably can support that. Now the question is you have other message lines also goes to stdout (instead of stderr), it is hard for cdf-score-calculation to figure out which line contains message, which contains actual data. I plan to modify cas-finder to prefix all data line with a character "@", if output file is "-", you might have a more creative and cleaner solution to this use case.

Thanks again for your effort in providing such a high-performance piece of program!

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024
  1. fixed!
  2. It works in my environment. Anway -33 is CL_INVALID_DEVICE, it occurs when the device is invalid or not accessible. Would you try other numbers again?
  3. Nice idea. I think it is easy, just I would put the message lines towards the stderr. I will implement it soon.

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

Done. Would you test it again?

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Thanks! It works great. "-" triggers output to stdout!
I am not familiar with cuda, but I cannot see to get Gx to work.
As shown below, this computer has 6 GPUs, I tried number 1 to 6, only 6 works.

$ ./cas_dev t.in G out
Total 6 device(s) found.
Loading input file...
Reading all_exonome.fa...
Sending data to devices...
Chunk load started.
6 devices selected to analyze...
Finding pattern in chunk #1...
Comparing patterns in chunk #1...
6.4111 seconds elapsed.
$ ./cas_dev t.in G1 out
Segmentation fault
$ ./cas_dev t.in G2 out
clCreateContext Failed: -33
$ ./cas_dev t.in G3 out
Segmentation fault
$ ./cas_dev t.in G4 out
clCreateContext Failed: -33
$ ./cas_dev t.in G5 out
Segmentation fault
$ ./cas_dev t.in G6 out
Total 6 device(s) found.
Loading input file...
Reading all_exonome.fa...
Sending data to devices...
Chunk load started.
6 devices selected to analyze...
Finding pattern in chunk #1...
Comparing patterns in chunk #1...
6.19232 seconds elapsed.

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

I fixed it, there was an error while counting the number of available devices for each platform.

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Perfect, works as expected! (The only minor thing is you might want to update the usage text to show the new features)

Also, just an interesting thing (I test to query 6 sequences against human exonome)
Performance:
G6: 13.7023 seconds elapsed.
G5: 9.86336 seconds elapsed.
G4: 11.3326 seconds elapsed.
G3: 7.96847 seconds elapsed.
G2: 9.24136 seconds elapsed.
G1: 3.79438 seconds elapsed.

Thanks again for making all the improvements so quickly, this tools is perfect now!

$ ./cas_dev t.in G6 - > /dev/null
Total 6 device(s) found.
Loading input file...
Reading all_exonome.fa...
Sending data to devices...
Chunk load started.
6 devices selected to analyze...
Finding pattern in chunk #1...
Comparing patterns in chunk #1...
13.7023 seconds elapsed.

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Maybe you have a bug, G1 actually means using all GPUs and G6 means use only one? The performance data above seems odd.

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

I think because I/O latency dominates when Cas-OFFinder analyze such a small number of targets, I expect different result when the number of targets becomes so large. Well, I will test it anyway.

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Hi, I have trouble with GRID K520.
I try to run the GPU job on Amazon GPU instance, now, which uses GRID K520.
The code has segmentation fault during initOpenCL.
My command line is : cas-offinder t.in G -

I added a few cerr line into initOpenCL()

    m_devnum = 0;
    cerr << "platform_cnt=" << platform_cnt << endl;
    for (i = 0; i < platform_cnt; i++) {
            platform_maxdevnum = maxdevnum - m_devnum;
            cerr << "i=" << i << " maxdevnum=" << maxdevnum << " m_devnumn=" << m_devnum << endl;
            if (platform_maxdevnum > 0) {
                    oclGetDeviceIDs(platforms[i], m_devtype, platform_maxdevnum, devices + m_devnum, &device_cnt);
                    m_devnum += MIN(device_cnt, platform_maxdevnum);
            }
            cerr << "Processed: " << i << "  m_devnum=" << m_devnum << endl;
    }

Here is the output (there are only 2 GPUs, however, m_devnum is 10 intead of 1 after process i=0)
The old versions seems to run, so the bug is related to the new logic introduced.
Thanks!

[ec2-user@ip-172-31-14-64 cas-offinder-develop_new]$ ./cas-offinder ../t.in G -
platform_cnt=2
i=0 maxdevnum=1000 m_devnumn=0
Processed: 0 m_devnum=10
i=1 maxdevnum=1000 m_devnumn=10
Processed: 1 m_devnum=11
Segmentation fault

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

Hi, would you add the same cerr codes to the old version of Cas-OFFinder?
I think there should be no difference, so weird.

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

With old version:
[ec2-user@ip-172-31-14-64 cas-offinder-master]$ ./cas-offinder
...
Available device list:
Type: CPU, ' Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz'
Type: GPU, ' Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz'
Type: ACCELERATOR, ' Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz'
Type: GPU, 'GRID K520'

I added the following debugging lines:
m_devnum = 0;
cerr << "A. platform_cnt=" << platform_cnt << " m_devtype=" << m_devtype << " MAX_DEVICE_NUM=" << MAX_DEVICE_NUM << " device_cnt=" << device_cnt << " m_devnum=" << m_devnum << " devices=" << devices << endl;
for (i = 0; i < platform_cnt; i++) {
cerr << "i=" << i << endl;
oclGetDeviceIDs(platforms[i], m_devtype, MAX_DEVICE_NUM - m_devnum, devices + m_devnum, &device_cnt);
cerr << "B. platform_cnt=" << platform_cnt << " m_devtype=" << m_devtype << " MAX_DEVICE_NUM=" << MAX_DEVICE_NUM << " device_cnt=" << device_cnt << " m_devnum=" << m_devnum << " devices=" << devices << endl;
m_devnum += device_cnt;
}
cerr << "C. platform_cnt=" << platform_cnt << " m_devtype=" << m_devtype << " MAX_DEVICE_NUM=" << MAX_DEVICE_NUM << " device_cnt=" << device_cnt << " m_devnum=" << m_devnum << " devices=" << devices << endl;
Output:
[ec2-user@ip-172-31-14-64 cas-offinder-master]$ ./cas-offinder test.in G1 t
A. platform_cnt=2 m_devtype=4 MAX_DEVICE_NUM=1000 device_cnt=0 m_devnum=0 devices=0x7ffd30b669a0
i=0
B. platform_cnt=2 m_devtype=4 MAX_DEVICE_NUM=1000 device_cnt=0 m_devnum=0 devices=0x7ffd30b669a0
i=1
B. platform_cnt=2 m_devtype=4 MAX_DEVICE_NUM=1000 device_cnt=1 m_devnum=0 devices=0x7ffd30b669a0
C. platform_cnt=2 m_devtype=4 MAX_DEVICE_NUM=1000 device_cnt=1 m_devnum=1 devices=0x7ffd30b669a0
Total 1 device(s) found.

With the new version:
m_devnum = 0;
cerr << "A. platform_cnt=" << platform_cnt << " maxdevnum=" << maxdevnum << " m_devtype=" << m_devtype << " device_cnt=" << device_cnt << " m_devnum=" << m_devnum << " devices=" << devices << endl;
for (i = 0; i < maxdevnum; i++) {
cerr << "i=" << i << endl;
oclGetDeviceIDs(platforms[i], m_devtype, maxdevnum - m_devnum, devices + m_devnum, &device_cnt);
cerr << "B. platform_cnt=" << platform_cnt << " maxdevnum=" << maxdevnum << " m_devtype=" << m_devtype << " device_cnt=" << device_cnt << " m_devnum=" << m_devnum << " devices=" << devices << endl;
m_devnum += device_cnt;
}
cerr << "C. platform_cnt=" << platform_cnt << " maxdevnum=" << maxdevnum << " m_devtype=" << m_devtype << " device_cnt=" << device_cnt << " m_devnum=" << m_devnum << " devices=" << devices << endl;

[ec2-user@ip-172-31-14-64 cas-offinder-develop_new]$ ./cas-offinder ../t.in G1 -
A. platform_cnt=2 maxdevnum=1 m_devtype=4 device_cnt=10 m_devnum=0 devices=0x12370a0
i=0
B. platform_cnt=2 maxdevnum=1 m_devtype=4 device_cnt=10 m_devnum=0 devices=0x12370a0
C. platform_cnt=2 maxdevnum=1 m_devtype=4 device_cnt=10 m_devnum=10 devices=0x12370a0
Segmentation fault
[ec2-user@ip-172-31-14-64 cas-offinder-develop_new]$ ./cas-offinder ../t.in G t
A. platform_cnt=2 maxdevnum=1000 m_devtype=4 device_cnt=10 m_devnum=0 devices=0x877030
i=0
B. platform_cnt=2 maxdevnum=1000 m_devtype=4 device_cnt=10 m_devnum=0 devices=0x877030
i=1
B. platform_cnt=2 maxdevnum=1000 m_devtype=4 device_cnt=1 m_devnum=10 devices=0x877030
i=2
clGetDeviceIDs Failed: -32

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

It looks like device_cnt should be initialized to zero before running clGetDeviceIDs.
Would you try the latest commit again? Maybe it also should be applied to master, obviously this is a potential bug.

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Hi, I rebooted my Amazon instance and just tested it. Somehow the error is gone. I am no longer able to reproduce it, so I can only assume it my own fault somehow. Thank you so much for your patience, I really appreciate all your helps!

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

I am now testing with 100 gRNA as input, the background sequence is at 10% of the genome, so there is decent amount of computation.
Run time statistics does seem odd (test with a server with 6 GPU devices, cas-offinder detected six devices correctly).

Running 1 G6 job on 100 sgRNA takes 332.941 sec. This is very strange compared to numbers below.
Running 2 G3 jobs on 100 sgRNA each takes ~35 sec. each
Running 3 G2 jobs on 100 sgRNA each takes ~ 1190 sec. each
Running 6 G1 jobs on 100 sgRNA each (2-4 jobs will crash with: clEnqueueNDRangeKernel Failed: -4)
#!/bin/sh
date > t0.start; ./cas-offinder t.in G1 t0.ou &
date > t1.start; ./cas-offinder t.in G1 t1.ou &
date > t2.start; ./cas-offinder t.in G1 t2.ou &
date > t3.start; ./cas-offinder t.in G1 t3.ou &
date > t4.start; ./cas-offinder t.in G1 t4.ou &
date > t5.start; ./cas-offinder t.in G1 t5.ou &
The remaining good runs takes about ~145 sec each.

Looks like there is huge benefit of running jobs as G1, if we can solve the clEnqueueNDRangeKernel issue.

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

I can only assume it my own fault somehow.

No I don't think so. This is obviously a bug in Cas-OFFinder. The 'device_cnt' variable must be initialized first, according to the spec of OpenCL. It might work sometimes without proper initialization, but it is not a complete solution.

The speed of computation depends on many things. Memory size, vectorization, interface speed of PCIe slots, driver, and so on. In other words, computation time does not always decrease along the number of devices. That means it is so reasonable to include a feature for user to limit the number of computing devices in the future version of Cas-OFFinder.

Because in this case all Cas-OFFinder processes try to use the first device simultaneously, then clEnqueueNDRangeKernel must fail (usually I don't recommend you to run Cas-OFFinder on one device more than 2 instances). Hmm, maybe it would be better to change the meaning of number after 'G', e.g. maximum number of GPU -> index of GPU?

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Hmm, maybe it would be better to change the meaning of number after 'G', e.g. maximum number of GPU -> index of GPU?
That would be good, I run more testing, I can tell it makes a lot more sense to run multiple G1 jobs than a job using multiple devices.
Do you consider a syntax like G4.0.1.2.4? G4 stands for using 4 devices and don't care which four. G4.0.1.2.4 means using 4 devices with ID 0, 1, 2, 4? Then we can submit G1.0, G1.1, G1.2 ...

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

Yes, but I would like to suggest below syntax for specifying devices using : and ,.

examples)
G1:3 (GPU 1 to 3)
G2,4 (GPU 2 and 4)
C1 (CPU 1)

rules)

  1. Device ID starts with 1. Dry-run of Cas-OFFinder should print device ID for each type of device.
  2. Both bounds are inclusive for :.
  3. Specifying total number of GPU is not necessary anymore.

What do you think?

from cas-offinder.

brownmk avatar brownmk commented on May 18, 2024

Looks good to me. I assume others won't have a case where they need 1 GPU, but don't know which one is available. In our GPU cluster, we can get the GPU ID of the free devices, so this is not a problem for us.

from cas-offinder.

pjb7687 avatar pjb7687 commented on May 18, 2024

Well, maybe when one needs to run several processes simultaneously, I think it would be useful. But I am not pretty sure how many people have such a big cluster with lots of GPUs installed in..

Anyway would you mind if I ask the name of GPU cluster that you use? Is it commercially available? It looks pretty nice.

from cas-offinder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.