Comments (16)
You can fix this issue with:
diff --git a/pybedtools/helpers.py b/pybedtools/helpers.py
index 7733c15..7af0e00 100644
--- a/pybedtools/helpers.py
+++ b/pybedtools/helpers.py
@@ -301,7 +301,10 @@ def call_bedtools(cmds, tmpfn=None, stdin=None, check_stderr=None):
print '\n\t' + '\n\t'.join(problems[err.errno])
raise OSError('See above for commands that gave the error')
- return output
+ try:
+ return output
+ finally:
+ output.close()
def set_bedtools_path(path=""):
but I'm not sure if that's supported in 2.6 and I dont think it's in 2.5.
Also, note that there's currently no __del__
method on a BedTool
, though
I tried adding one and it doesn't get called. Until the loop finishes.
from pybedtools.
disregard that last commit-comment . . .
tried the following, but it tries to close BedTools that are string- or iterator-based:
try:
return output
finally:
output.close()
then tried this, cause really what we want to close are the stdout file-like objects of Popen objects:
try:
if hasattr(output, 'close'):
output.close()
finally:
return output
but tests fail on line 226 of helpers (https://github.com/daler/pybedtools/blob/master/pybedtools/helpers.py#L226), for tests that try to read a Popen.stdout from a "streamed" BedTool.
So the problem is that we don't want to always close a Popen.stdout, because it may be used later to chain together iterable BedTools. That is, the iterator can be set up during this call to call_bedtools
, but it may not actually get used for an undefined amount of time. Maybe there's a way to detect if it's been consumed or not?
from pybedtools.
so the order in your 2nd block is reversed. I would guess
try:
return output
finally:
if hasattr(output, "communicate"):
output.close()
from pybedtools.
Oops, right. Typo was in the comment; the correct code still has the line 226 problem.
I think the issue is in setting up the Popen object in the first place. The big try-except block in call_bedtools
catches the OSError when there are too many files open. The problem though is that this call to call_bedtools
should not have its stdout closed . . . really we want to close the first stdout ever opened in this session of pybedtools that is still open. Jeez, does that mean we need to maintain a stack of still-open Popen.stdouts?
from pybedtools.
I tried a bunch of stuff for this. I thougth the most promising was:
def stream_iter(p):
for s in p.stdout:
yield s
p.stdout.close()
output = stream_iter(p)
which could work but then it has problems with some of the assumptions later on.
And, since we never actually iterate over the stream in the above test-case, close will still not be called.
So, one could then make a class with a __del__
method that closes the stream...
Note that the BedTool class currently has no __del__
method.
from pybedtools.
if the above test-case is really a problem, one can explicitly do c.fn.close()
as a stop-gap.
from pybedtools.
yeah, the bug initially came up with all the random intersections, and I had to use that fn.close() thing in randomintersection:
https://github.com/daler/pybedtools/blob/master/pybedtools/bedtool.py#L1709
but that's inelegant, and requires a user who is doing tons of iterations on their own outside of the randomintersection method (like the test case above) to know they have to fn.close()
their BedTools. i played around with __del__
methods too, to no avail . . . hence its continued absence. I'll have to think about your stream_iter idea though.
from pybedtools.
While iterating over some bed files, I call field_count() to get a particular column. Python crashes without a traceback after precisely 1008 field_count() calls on my machine.
Try the following with a 2000-line bed file:
a = pybedtools.BedTool('test.bed')
for i, b in enumerate(a):
print i
x = a.field_count()
and python will choke with Error: The requested bed file (test.bed) could not be opened. Exiting!
Somewhere around 1k lines. I can easily work around this by keeping the field_count around, but it seems like the same fishy issue as randomintersect (hence posting here instead of a new issue).
I think the exit on line https://github.com/daler/pybedtools/blob/master/src/bedFile.cpp#L41 is being called and I think the reason is there are too many files open. Perhaps the __iter__
at https://github.com/daler/pybedtools/blob/master/pybedtools/bedtool.py#L618 is not being cleaned up propely? Unlike randomintersection, I can't explicitly fn.close()
the bedtool since I'm not done iterating over it.
from pybedtools.
Yeah, this is a tough bug and I still don't have a good solution yet. Iterators don't get cleaned up properly, and deleting a BedTool doesn't get rid of the open file reference.
For the specific case of checking how many fields you're working with. . . It depends on what you're using it for, but checking the length of an Interval directly will be way more efficient, and works no prob on large files:
a = pybedtools.BedTool('test.bed')
for i, b in enumerate(a):
print i
x = len(b.fields)
from pybedtools.
as of cdb789c this appears to be working. That is, relevant tests pass, and the previous test-case above works . . . specifically, this used to fail with an OSError because of too many open files, but now works:
import pybedtools
import sys
a = pybedtools.example_bedtool('a.bed')
b = pybedtools.example_bedtool('b.bed')
for i in range(10000):
sys.stderr.write('\r%s' % i)
sys.stderr.flush()
c = a.intersect(b, stream=True)
len(c)
The fix is just on the iterator side, specifically in IntervalIterator.__next__
. See relevant code here. When the stream is being read from subprocess.PIPE, the pipe is closed when it's done reading. An open file will also be closed upon hitting StopIteration; I don't think this latter case will cause any problems, unless the user wants to seek and read inside the file or something.
The part about this fix that I don't fully understand is that if you trap only the StopIteration in IntervalIterator.__next__
, it still tries to call next() on the stream, even though immediately after closing a StopIteration has been re-raised. This results in a ValueError because it tried to call next()
on a closed file object. Trapping this ValueError as well as the StopIteration seems to have fixed the original issue. But I don't know why the first StopIteration isn't being propagated up to the IntervalIterator
caller. Couldn't find any documentation on __next__
methods in Cython that discussed this, either.
It appears to work well for me though. More testing prob needed before closing the issue.
from pybedtools.
Has been working well for a month now, so closing.
from pybedtools.
I'm getting a similar error with the current version of pybedtools, can you help?:
<type 'exceptions.OSError'>: Too many open files
The command was:
multiBamCov -bed di.bed -bams 1.all.bam 2.all.bam
Things to check:
* Too many files open -- please submit a bug report so that this can be fixed
from pybedtools.
Could you provide some code that reproduces the problem?
from pybedtools.
I'm struggling to get this to display correctly, maybe because it's just pseudocode, but I hope you get the idea. It should be indented from the "for" to the line with two "##" but I must not know how to use the proper syntax for git.
There is lots of code so I'll try to keep it concise:
(@dalerr edit: you can use three backticks to get nice syntax-highlighted code blocks)
f = open("file with exon coordinates")
exp1 = "bam filename for experiment1"
exp2 = "bam filename for experiment2"
countList = list()
for line in f.readlines():
upIntron_coords, downIntron_coords = lineparse(line) #parse upstream and downstream intron coordinates
upIntron = pybedtools.BedTool(upIntron_coords, from_string=True).saveas("uI.bed")
downIntron = pybedtools.BedTool(downIntron_coords, from_string=True).saveas("dI.bed")
count_uI = upIntron.multi_bam_coverage(bams=[exp1, exp2], D=True).saveas("countfile")
del upIntron
counts_up = parse(count_uI)
del count_uI
count_dI = downIntron.multi_bam_coverage(bams=[exp1, exp2], D=True).saveas("countfile")
del downIntron
counts_down = parse(count_dI)
del count_dI
countList.append([counts_up, counts_down])
##gets here about 50 times then raises the exception that there are too many files open
exit()
from pybedtools.
Ah, so you're creating two new BedTool
objects for every line. I (or any collaborators) have been unable to successfully free up the open file in the BedTool.__del__
method, which means that calling del
does not close the file.
As a result, you will quickly reach your OS's open file limit.
I would recommend creating two complete BED files first out of your input file, something like:
def split_input(fn):
fout1 = open(fn + '.down')
fout2 = open(fn + '.up')
for line in open(fn):
up, down = lineparse(line)
fout1.write(up)
fout2.write(down)
fout1.close()
fout2.close()
Then, run multi_bam_coverage
on each file, and then parse the results.
bt_up = pybedtools.BedTool(fn + '.up')
bt_dn = pybedtools.BedTool(fn + '.down')
count_uI = bt_up.multi_bam_coverage(bams=[exp1, exp2], D=True).saveas('uI.bed')
count_dI = bt_dn.multi_bam_coverage(bams=[exp1, exp2], D=True).saveas('dI.bed')
# then parse count_uI and count_dI
This way, you only ever create 4 BedTool
objects. This should run much faster, too.
See #55 for more info about this, for a user with a similar problem. Time for me to write some docs on this, I think.
from pybedtools.
Thanks. I'll try to work that in...
from pybedtools.
Related Issues (20)
- Support Python 3.10 and 3.11 HOT 1
- "python setup.py bdist_wheel did not run successfully" when pip installing with python v3.11 HOT 8
- to_dataframe() creates 0th row with generic names in nucleotide_content HOT 2
- build failure under python 3.11 HOT 6
- pybedtools intersect error HOT 2
- Cannot create a BedTool object from list of regions that uses np.int64 coordinates
- remove historical py27 support HOT 1
- bedtools intersect reported incorrect interval intersection HOT 3
- Cythonizing files requires `language_level=2` to be set in cythonize() HOT 4
- pybedtools multi_bam_coverage assistance HOT 2
- "fastaFromBed" error HOT 2
- intersect with multiple -b arguments not working with -sorted HOT 1
- Unable to install pybedtools==0.9.1 in Python3.10 HOT 4
- Len modifying the Bedtools after a filter HOT 2
- Has pybedtools considered packaging bedtools? HOT 3
- how to mask gap regions for randomization? HOT 1
- Issue while doing pip install pybedtools HOT 3
- Inconsistent behaviour when using files from `pathlib.PosixPath` with BedTool functions...
- pybedtools.bedtool.Bedtool.sort()
- total_coverage giving a incorrect value
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pybedtools.