Giter Club home page Giter Club logo

Comments (8)

karenfeng avatar karenfeng commented on June 9, 2024

Hi @jjfarrell, it looks like the path to the chain file was not passed properly in the expression. You’ll want to use string interpolation:

liftover_expr = f”lift_over_coordinates(contigName, start, end, {chain_file}, .99)"

I’ll clarify this in the docs. Thanks for bringing this up!

from glow.

jjfarrell avatar jjfarrell commented on June 9, 2024

Thanks for the quick response @karenfeng!

However, I am now getting a different error message when I add the f"" and {chain_file}. Both error messages refer to (line 1, pos 46).

Traceback (most recent call last):
File "/share/pkg.7/spark/2.4.3/install/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/share/pkg.7/spark/2.4.3/install/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.functions.expr.
: org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '/' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'ANY', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'N 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'PIVOT', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'FIRST', 'AFTEROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'DIRECTORY', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'COST', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'MINUS', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IGNORE', 'BOTH', 'LEADING', 'TRAILING', 'IF', 'POSITION', 'EXTRACT', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT''DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INOUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSAEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 46)

== SQL ==
lift_over_coordinates(contigName, start, end, /restricted/projectnb/casa/wgs.hg38/benchmark.giab/b37ToHg38.over.chain, .99)
----------------------------------------------^^^

    at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
    at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression(ParseDriver.scala:44)
    at org.apache.spark.sql.functions$.expr(functions.scala:1366)
    at org.apache.spark.sql.functions.expr(functions.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:745)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/restricted/projectnb/casa/wgs.hg38/benchmark.giab/glow_liftover.py", line 22, in
input_with_lifted_df = input_df.select('contigName', 'start', 'end').withColumn('lifted', expr(liftover_expr))
File "/share/pkg.7/spark/2.4.3/install/python/lib/pyspark.zip/pyspark/sql/functions.py", line 675, in expr
File "/share/pkg.7/spark/2.4.3/install/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/share/pkg.7/spark/2.4.3/install/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
pyspark.sql.utils.ParseException: "\nextraneous input '/' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'ANY', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLILL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'PIVOT', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'CURRENT', 'FIRST', 'AFTER', 'LAST', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'DIRECTORY', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'COST', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', N', 'EXCEPT', 'MINUS', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IGNORE', 'BOTH', 'LEADING', 'TRAILING', 'IF', 'POSITION', 'EXTRACT', 'DIV', 'PEET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'E', 'UNCACHE', 'LAZY', 'FORMATTED', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES',', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 46)\n\n== SQL ==\nlift_over_coordinates(contigName, start, end, /restricted/projectnb/casa/wgs.hg38/benchmark.giab/b37ToHg38.over.chain, .--------------------------------------^^^\n"

from glow.

karenfeng avatar karenfeng commented on June 9, 2024

Whoops, I forgot to wrap the chain file string in single-quotes to express it as a literal. Can you try this?
liftover_expr = f”lift_over_coordinates(contigName, start, end, ‘{chain_file}’, .99)"

from glow.

jjfarrell avatar jjfarrell commented on June 9, 2024

I tried that and also did not work.  Next, I tried eliminating the expression which also did not work.

Finally I found the errors were being generated by the syntax in the select part of the statement. The contigName, start, end needed to be quoted: 'contigName', 'start', 'end'

input_with_lifted_df = input_df.select('contigName', 'start', 'end').withColumn('lifted', glow.lift_over_coordinates('contigName', 'start','end', chain_file, 0.99))

Now that I was passed this issue, I tried the variant liftover.......

output_df = glow.transform('lift_over_variants', input_df, chain_file=chain_file, reference_file=reference_file)

That triggered the following error....

py4j.protocol.Py4JJavaError: An error occurred while calling o63.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 8, scc-q20.scc.bu.edu, executor 14): htsjdk.tribble.TribbleException: Badly formed variant context at location chr1:596697; getEnd() was 596797 but this VariantContext contains an END key with value 532177

The original b37 vcf has a deletion here:

1 532077 ACATTCATGCTCACTCATACACACCCAGATCATATATACACTCGTGCACACATTCACACTCATACACACCCAAATCATACTCACATTCATGCACACATGTT A
SVLEN=-100;;SVTYPE=DEL;END=532177;sizecat=100to299;

The liftover to hg38 should look like this:
chr1 596697 REF=ACATTCATGCTCACTCATACACACCCAGATCATATATACACTCGTGCACACATTCACACTCATACACACCCAAATCATACTCACATTCATGCACACATGTT
ALT=A
INFO Fields
SVLEN=-100; SVTYPE=DEL;END=596797;sizecat=100to299;

The error message suggests the transformer is not updating the INFO/END field from 532177 to 596797 and an error is being triggered since the END is before the start. An incorrect INFO/END will cause problems with tabix and other programs.

The VCF I am using is the GIAB SV vcf to replicated this. It can be downloaded from here:

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz

from glow.

karenfeng avatar karenfeng commented on June 9, 2024

Good to hear you got the syntax issues ironed out - I'll clean up the docs to reflect this usage. I believe that the issue may be resolved by upgrading our dependency on Picard (broadinstitute/picard#1469). I'll try updating our build to see if this resolves the problem.

from glow.

jjfarrell avatar jjfarrell commented on June 9, 2024

@karenfeng I tested a relatively new version of GATK LiftoverVcf gatk/4.1.7.0 and found the error was still there. So I submitted the issue to the GATK github repository where it was migrated to the Picard repo. Here is the issue:
broadinstitute/picard#1552

So it would be best hold off on the upgrade with Picard till a new release is available that resolves that.

from glow.

karenfeng avatar karenfeng commented on June 9, 2024

Thanks for investing @jjfarrell, I'll keep an eye on the thread and we'll bump our dependency when the issue is resolved.

from glow.

jjfarrell avatar jjfarrell commented on June 9, 2024

@henrydavidge

Why was this closed? Was the issue resolved?

from glow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.