When running this line to run a Variant Liftover...... liftover_exp

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for the quick response <a class="user-mention notranslate" data-hovercard-type=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for investing <a class="user-mention notranslate" data-hovercard-type="user" da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Variant Liftover exception about glow HOT 8 CLOSED

projectglow commented on June 9, 2024

Variant Liftover exception

from glow.

Comments (8)

karenfeng commented on June 9, 2024

Hi @jjfarrell, it looks like the path to the chain file was not passed properly in the expression. You’ll want to use string interpolation:

liftover_expr = f”lift_over_coordinates(contigName, start, end, {chain_file}, .99)"

I’ll clarify this in the docs. Thanks for bringing this up!

from glow.

jjfarrell commented on June 9, 2024

Thanks for the quick response @karenfeng!

However, I am now getting a different error message when I add the f"" and {chain_file}. Both error messages refer to (line 1, pos 46).

Traceback (most recent call last):
File "/share/pkg.7/spark/2.4.3/install/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/share/pkg.7/spark/2.4.3/install/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.functions.expr.
: org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '/' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'ANY', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'N 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'PIVOT', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'FIRST', 'AFTEROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'DIRECTORY', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'COST', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'MINUS', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IGNORE', 'BOTH', 'LEADING', 'TRAILING', 'IF', 'POSITION', 'EXTRACT', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT''DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INOUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSAEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 46)

== SQL ==
lift_over_coordinates(contigName, start, end, /restricted/projectnb/casa/wgs.hg38/benchmark.giab/b37ToHg38.over.chain, .99)
----------------------------------------------^^^

    at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
    at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression(ParseDriver.scala:44)
    at org.apache.spark.sql.functions$.expr(functions.scala:1366)
    at org.apache.spark.sql.functions.expr(functions.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:745)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/restricted/projectnb/casa/wgs.hg38/benchmark.giab/glow_liftover.py", line 22, in
input_with_lifted_df = input_df.select('contigName', 'start', 'end').withColumn('lifted', expr(liftover_expr))
File "/share/pkg.7/spark/2.4.3/install/python/lib/pyspark.zip/pyspark/sql/functions.py", line 675, in expr
File "/share/pkg.7/spark/2.4.3/install/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/share/pkg.7/spark/2.4.3/install/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
pyspark.sql.utils.ParseException: "\nextraneous input '/' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'ANY', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLILL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'PIVOT', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'CURRENT', 'FIRST', 'AFTER', 'LAST', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'DIRECTORY', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'COST', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', N', 'EXCEPT', 'MINUS', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IGNORE', 'BOTH', 'LEADING', 'TRAILING', 'IF', 'POSITION', 'EXTRACT', 'DIV', 'PEET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'E', 'UNCACHE', 'LAZY', 'FORMATTED', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES',', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 46)\n\n== SQL ==\nlift_over_coordinates(contigName, start, end, /restricted/projectnb/casa/wgs.hg38/benchmark.giab/b37ToHg38.over.chain, .--------------------------------------^^^\n"

from glow.

karenfeng commented on June 9, 2024

Whoops, I forgot to wrap the chain file string in single-quotes to express it as a literal. Can you try this?
liftover_expr = f”lift_over_coordinates(contigName, start, end, ‘{chain_file}’, .99)"

from glow.

jjfarrell commented on June 9, 2024

I tried that and also did not work. Next, I tried eliminating the expression which also did not work.

Finally I found the errors were being generated by the syntax in the select part of the statement. The contigName, start, end needed to be quoted: 'contigName', 'start', 'end'

input_with_lifted_df = input_df.select('contigName', 'start', 'end').withColumn('lifted', glow.lift_over_coordinates('contigName', 'start','end', chain_file, 0.99))

Now that I was passed this issue, I tried the variant liftover.......

output_df = glow.transform('lift_over_variants', input_df, chain_file=chain_file, reference_file=reference_file)

That triggered the following error....

py4j.protocol.Py4JJavaError: An error occurred while calling o63.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 8, scc-q20.scc.bu.edu, executor 14): htsjdk.tribble.TribbleException: Badly formed variant context at location chr1:596697; getEnd() was 596797 but this VariantContext contains an END key with value 532177

The original b37 vcf has a deletion here:

1 532077 ACATTCATGCTCACTCATACACACCCAGATCATATATACACTCGTGCACACATTCACACTCATACACACCCAAATCATACTCACATTCATGCACACATGTT A
SVLEN=-100;;SVTYPE=DEL;END=532177;sizecat=100to299;

The liftover to hg38 should look like this:
chr1 596697 REF=ACATTCATGCTCACTCATACACACCCAGATCATATATACACTCGTGCACACATTCACACTCATACACACCCAAATCATACTCACATTCATGCACACATGTT
ALT=A
INFO Fields
SVLEN=-100; SVTYPE=DEL;END=596797;sizecat=100to299;

The error message suggests the transformer is not updating the INFO/END field from 532177 to 596797 and an error is being triggered since the END is before the start. An incorrect INFO/END will cause problems with tabix and other programs.

The VCF I am using is the GIAB SV vcf to replicated this. It can be downloaded from here:

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz

from glow.

karenfeng commented on June 9, 2024

Good to hear you got the syntax issues ironed out - I'll clean up the docs to reflect this usage. I believe that the issue may be resolved by upgrading our dependency on Picard (broadinstitute/picard#1469). I'll try updating our build to see if this resolves the problem.

from glow.

jjfarrell commented on June 9, 2024

@karenfeng I tested a relatively new version of GATK LiftoverVcf gatk/4.1.7.0 and found the error was still there. So I submitted the issue to the GATK github repository where it was migrated to the Picard repo. Here is the issue:
broadinstitute/picard#1552

So it would be best hold off on the upgrade with Picard till a new release is available that resolves that.

from glow.

karenfeng commented on June 9, 2024

Thanks for investing @jjfarrell, I'll keep an eye on the thread and we'll bump our dependency when the issue is resolved.

from glow.

jjfarrell commented on June 9, 2024

@henrydavidge

Why was this closed? Was the issue resolved?

from glow.

Variant Liftover exception about glow HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent