Comments (14)
Ok. Can you post the full .log file for the failed run?
from plink-ng.
PLINK v2.00a2LM 64-bit Intel (15 Feb 2018) www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to testchr22.log.
Options in effect:
--bgen ukb_imp_chr22_v2.bgen
--make-pgen
--out testchr22
--sample ukb672_imp_chr22_v2_s487406.sample
Start time: Fri Feb 16 12:08:33 2018
257847 MB RAM detected; reserving 128923 MB for main workspace.
Allocated 7259 MB successfully, after larger attempt(s) failed.
Using up to 32 threads (change this with --threads).
--bgen: 1255680 variants detected, format v1.2.
Error: File read failure.
End time: Fri Feb 16 12:08:34 2018
from plink-ng.
Okay, pretty sure I know what the problem is, posting what I think is a fix to GitHub in a few minutes.
from plink-ng.
Thanks a lot & again, thanks for developing this great tool!
from plink-ng.
Similar issue with 30 Jul Version. This fails at seemingly random positions.
Below are three different runs with slightly different parameters and three different points of failure.
Is this a bug / is it something that has been addressed in newer alpha releases?
1)
PLINK v2.00a2LM 64-bit Intel (30 Jul 2018) www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ukb_hDMg_QC_14.log.
Options in effect:
--bgen b_imp_chr14_v3.bgen
--hardy
--memory 30000
--missing
--out b_hDMg_QC_14
--sample Bgen.sample
--threads 1
Start time: Tue Sep 11 14:18:15 2018
1031229 MiB RAM detected; reserving 30000 MiB for main workspace.
Using 1 compute thread.
--bgen: 3037521 variants detected, format v1.2.
487409 samples imported from .sample file to ukb_hDMg_QC_14-temporary.psam .
--bgen: 312k variants scanned.
Error: File read failure.
End time: Tue Sep 11 14:31:03 2018
PLINK v2.00a2LM 64-bit Intel (30 Jul 2018) www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ukb_hDMg_QC_14.log.
Options in effect:
--bgen b_imp_chr14_v3.bgen
--hardy
--missing
--out b_hDMg_QC_14
--sample Bgen.sample
--threads 4
Start time: Tue Sep 11 14:48:48 2018
257272 MiB RAM detected; reserving 128636 MiB for main workspace.
Using up to 4 compute threads.
--bgen: 3037521 variants detected, format v1.2.
487409 samples imported from .sample file to ukb_hDMg_QC_14-temporary.psam .
--bgen: 26k variants scanned.
Error: File read failure.
End time: Tue Sep 11 14:57:56 2018
PLINK v2.00a2LM 64-bit Intel (30 Jul 2018) www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ukb_hDMg_QC_14.log.
Options in effect:
--bgen b_imp_chr14_v3.bgen
--hardy
--missing
--out b_hDMg_QC_14
--sample Bgen.sample
--threads 1
Start time: Tue Sep 11 15:00:36 2018
257272 MiB RAM detected; reserving 128636 MiB for main workspace.
Using 1 compute thread.
--bgen: 3037521 variants detected, format v1.2.
487409 samples imported from .sample file to ukb_hDMg_QC_14-temporary.psam .
--bgen: 24k variants scanned.
Error: File read failure.
End time: Tue Sep 11 15:11:56 2018
from plink-ng.
This looks like an unfixed bug; will try to replicate it today. Are there any differences between your b_imp_chr14_v3.bgen and the raw ukb_imp_chr14_v3.bgen file that I should be aware of?
from plink-ng.
None. They are exactly the same.
The issue comes up both with -make-pgen and -make-bed.
However, if I subset just the first 700k variants in that chromosome then it seems to be doing much better and gets way past the number of variants reported above.
Job log as of right now and still processing:
PLINK v2.00a2LM 64-bit Intel (30 Jul 2018) www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ukb_hDMg_QC_14.log.
Options in effect:
--bgen SubsetChr14_700k.bgen
--hardy
--missing
--out b_hDMg_QC_14
--sample Bgen.sample
--threads 10
Start time: Tue Sep 11 17:32:30 2018
64225 MiB RAM detected; reserving 32112 MiB for main workspace.
Using up to 10 threads (change this with --threads).
--bgen: 697895 variants detected, format v1.2.
487409 samples imported from .sample file to ukb_hDMg_QC_14-temporary.psam .
--bgen: 537k variants converted.
from plink-ng.
Succesful .pgen conversion with a subset. A memory handling/dataset size issue?
from plink-ng.
I'm primarily interested in sets of parameters that maximize the chance of a crash right now so I can investigate this properly. How quickly does --memory 30000 --threads 4 on the original (non-subsetted) dataset crash?
from plink-ng.
So, I am running these jobs in Slurm queue management system in our HPC cluster where the tasks are directed to different node machines. It seems to be the case that the failing jobs are all directed to a particular set of hosts. Let me investigate what the specifics of the failing machines are and I'll get back to you sometime soon.
from plink-ng.
Hi,
Do you have any more information on this? I haven't attempted to replicate the crash yet, since if you only observe it on one type of machine it's important for me to match that.
from plink-ng.
Sorry about the delay. I investigated it further and I think the "File read failure" and "Sample file not found" errors only occurred if I performed the analysis (read/write operations) on a server disk that was around, unbeknown to me, 95% full. So, I would presume that if storage capacity limits interfere with read/write operations then Plink outputs these cryptic messages. Though, I wouldn't really know what to do or how to update code based on this error. After moving analyses to a different disk, the problems disappeared.
from plink-ng.
Okay. Plink doesn't write anything to disk at all during the .bgen scanning phase, so the "read error" message is probably accurate as far as it goes; the question is what's happening on the system that's causing the error to occur during reading, instead of writing as one would expect. It's probably virtual memory/swapping-related.
I will look into modifying plink2's read- (and write-)error messages so that if any more information is available about the error, that is also logged.
from plink-ng.
Read- and write-error messages now surface the original error message reported by the OS, as of the 9 Oct 2019 build.
from plink-ng.
Related Issues (20)
- Multithread-only --glm floating point exception HOT 2
- Overly general "--bcf file could not be scanned twice"
- 0 phenotype values present after --pheno. HOT 1
- --score precision error HOT 3
- pgenlibr: multi-threading? HOT 1
- plink2 --sample-count running out of memory HOT 5
- Feature Request: Upstream Dockerfile
- CentOs7.9 install plink1.9 error HOT 3
- how to install in arm64 ? thks HOT 4
- Request: plink2 --extract/exclude-if-info [key(s)...] HOT 1
- Compilation failure HOT 1
- Can't run anny PLINK command on Ubuntu 22.04 HOT 1
- File read error when converting .vcf to PLINK .ped and .map file HOT 1
- Request for pgen files to include a version number in the header HOT 8
- can plink handle vcf files with * notation? HOT 1
- Pgenlib 0.90.1 is not installable from PyPI with Python 3.12.2 on Mac aarch64 HOT 1
- Logistic regression returns NAs output---FIRTH_CONVERGE_FAIL HOT 1
- plink2 --update-name issue HOT 1
- Obtaining different results from the same data in different formats HOT 2
- PLINK v1.9 `sprintf()` compilation warnings: Does a way to silence/fix them exist? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from plink-ng.