Comments (9)
A while ago I did a major overhaul on the randomization stuff, implementing a new method (BedTool._randomintersection
rather than BedTool.randomintersection
) that fixed this.
Looks like I never made this method the default for BedTool.randomstats()
.
To use the new method, you can specify new=True
and provide a genome_fn
to BedTool.randomstats
. To see the difference (both in syntax and cluttering of the temp dir), check out test/prevent_open_file_regression
.
So for your example, this should do the trick:
gfn = pybedtools.chromsizes_to_file(pybedtools.chromsizes('hg19'))
res = bed.randomstats(loh.fn, 100, processes=25, new=True, genome_fn=gfn)
(side note: If you take a look at the leftover temp files, I think they should all be genome files)
from pybedtools.
that does the trick. can genome_fn be a required argument to avoid this?
from pybedtools.
Yeah, that's probably best. I still need to do a little more cleaning up and "officially" deprecate the old randomstats method; when that happens the genome_fn will be required.
from pybedtools.
got it.
would you consider adding _orig_pool kwag to random_op. it'd be nice be able to keep re-using a pool if I'm running this across multiple pairs of bed files.
from pybedtools.
Sure.
Implementation-wise, would you rather create your own pool and use it for various parallel calls like
mypool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=mypool, *args, **kwargs)
bt.random_op(_orig_pool=mypool, *args, **kwargs)
bt.random_jaccard(_orig_pool=mypool, *args, **kwargs)
or have a BedTool._pool instance variable that, if None, will initialize with n processes, but subsequent calls (when _orig_pool=True) re-use that auto-created one?
# initializes a pool, BedTool._pool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)
# subsequent calls re-use BedTool._pool
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)
# set to None to re-initialize w/ different nprocs
bt._pool = None
bt.randomstats(_orig_pool=True, processes=500, *args, **kwargs)
from pybedtools.
I much prefer the former.
from pybedtools.
sorry for putting this in this thread, but it's another open file error. if i stream, it must be leaving open the process?
from pybedtools import BedTool
a = BedTool('chr1 1 2', from_string=True)
b = BedTool('chr1 1 2', from_string=True)
for i in range(10000):
print i
c = a.intersect(b, stream=True)
is that expected to leak?
from pybedtools.
In this case, I think the answer is yes:
The way streaming bedtools are closed is by hitting a StopIteration (see cbedtools.IntervalIterator). Since c
in this example is never iterated over, it never gets a chance to raise a StopIteration to close the stream.
But it would be nice if the garbage collector saw that the streaming BedTool from iteration i-1 no longer has any references, and cleans it up (would a __del__
method be called then?). But this starts to get to the reference counting part of Python & Cython that I don't have a handle on yet. Any ideas?
from pybedtools.
i tried a number of things including __del__
, but can't get it to work. it doesn't collect them until the program terminates...
Streaming over the results does prevent the error in this case.
I'm getting another file handles open error that I haven't been able to create a small test-case for..
from pybedtools.
Related Issues (20)
- Support Python 3.10 and 3.11 HOT 1
- "python setup.py bdist_wheel did not run successfully" when pip installing with python v3.11 HOT 8
- to_dataframe() creates 0th row with generic names in nucleotide_content HOT 2
- build failure under python 3.11 HOT 6
- pybedtools intersect error HOT 2
- Cannot create a BedTool object from list of regions that uses np.int64 coordinates
- remove historical py27 support HOT 1
- bedtools intersect reported incorrect interval intersection HOT 3
- Cythonizing files requires `language_level=2` to be set in cythonize() HOT 4
- pybedtools multi_bam_coverage assistance HOT 2
- "fastaFromBed" error HOT 2
- intersect with multiple -b arguments not working with -sorted HOT 1
- Unable to install pybedtools==0.9.1 in Python3.10 HOT 4
- Len modifying the Bedtools after a filter HOT 2
- Has pybedtools considered packaging bedtools? HOT 3
- how to mask gap regions for randomization? HOT 1
- Issue while doing pip install pybedtools HOT 3
- Inconsistent behaviour when using files from `pathlib.PosixPath` with BedTool functions...
- pybedtools.bedtool.Bedtool.sort()
- total_coverage giving a incorrect value
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pybedtools.