This issue will probably stay open for longer than last one :)
I am interested in running pycscope on some Python3 codes so I can jump around easier. I did a quick port of the code to Python3. The git diff will be attached at the end. I will make some comments here on what happened.
First I ran 2to3. This addressed the prints, string typing and exception syntax issues. Some other minor issues were also easily addressed:
- Got rid of the import from compiler (not used and obsolete in python3);
- Strings have now type str instead of bytes. A couple of type comparison to bytes were changed to str.
- There is one place where I had to change string.replace(sourcecode, ..) to sourcecode.replace(...).
- The fix for DOS line endings in the output files is best addressed in P3 with the newline argument for open.
However the nonterminal symbol decoding of the augmented assignment had a problem that took quite a bit of work to figure out. Here the parse tree has changed in Python. From grammar.html:
P3.2:
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
P2.7:
expr_stmt: testlist (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist))*)
The end fix was just to change one symbol comparison from testlist (for P2) to testlist_star_expr (for P3) and it has worked so far. However working through this I also realized that the current implementation using raw parser output is a bit hairy and many supported grammar features are not actually implemented. For example the implementation for the above augassign only supports testlist on the RHS (hardcoded in pycscope) but the language actually allows for yield_expr as well.
Does Python AST provide enough information for pycscope to work with instead of dealing with the raw parse tree? That should simplify the logic and make the whole thing safer with complete coverage? AST provides lineno and column offset. Cross compilers seem to use AST instead of parser ST objects.
There is also the question of whether one code base should be maintained for both P2/P3. The symbol decoding difference above is definitely not compatible and need to be handled differently depending on whether P2 or P3 is at issue.
Anyway here is the patch for P3 that works for me:
diff --git a/pycscope.py b/pycscope.py
index 254d313..533ceb8 100755
--- a/pycscope.py
+++ b/pycscope.py
@@ -22,3 +22,3 @@ __usage__ = """Usage: pycscope.py [-R] [-t cnt] [-f reffile] [-i srclistfile] [f
import getopt, sys, os, os.path, string, types
-import keyword, parser, symbol, token, compiler
+import keyword, parser, symbol, token
from threading import Lock, Thread
@@ -94,3 +94,3 @@ def main():
except getopt.GetoptError:
- print __usage__
+ print(__usage__)
sys.exit(2)
@@ -111,4 +111,4 @@ def main():
threadCount = val
- except Exception, e:
- print __usage__
+ except Exception as e:
+ print(__usage__)
sys.exit(2)
@@ -117,3 +117,3 @@ def main():
if o == "-i":
- args.extend(map(string.rstrip, open(a, 'r').readlines()))
+ args.extend(list(map(string.rstrip, open(a, 'r').readlines())))
@@ -202,4 +202,4 @@ def work(basepath, gen, lock):
indexbuff_len = parseFile(basepath, fname, indexbuff, indexbuff_len, fnamesbuff)
- except SyntaxError, se:
- print "pycscope.py: %s: %s" % (se.filename, se)
+ except SyntaxError as se:
+ print("pycscope.py: %s: %s" % (se.filename, se))
pass
@@ -225,4 +225,4 @@ def workT(basepath, gen, lock):
indexbuff_len = parseFile(basepath, fname, indexbuff, indexbuff_len, fnamesbuff)
- except SyntaxError, se:
- print "pycscope.py: %s: %s" % (se.filename, se)
+ except SyntaxError as se:
+ print("pycscope.py: %s: %s" % (se.filename, se))
pass
@@ -288,3 +288,3 @@ def parseFile(basepath, relpath, indexbuff, indexbuff_len, fnamesbuff):
indexbuff_len = parseSource(filecontents, indexbuff, indexbuff_len)
- except SyntaxError, se:
+ except SyntaxError as se:
se.filename = fullpath
@@ -308,3 +308,3 @@ def replaceNodeType(treeList):
for i in range(1, len(treeList)):
- if type(treeList[i]) == types.ListType:
+ if type(treeList[i]) == list:
replaceNodeType(treeList[i])
@@ -324,3 +324,3 @@ class Symbol(object):
"""
- assert (mark == Mark.FUNC_END or name) and (type(name) == types.StringType), "Must have an actual symbol name as a string (unless marking function end)."
+ assert (mark == Mark.FUNC_END or name) and (type(name) == str), "Must have an actual symbol name as a string (unless marking function end)."
@@ -358,3 +358,3 @@ class Symbol(object):
else:
- print "Symbol(): does not have attribute <%s>" % name
+ print("Symbol(): does not have attribute <%s>" % name)
raise AttributeError(name)
@@ -368,3 +368,3 @@ class Symbol(object):
- def __nonzero__(self):
+ def __bool__(self):
""" Defined so that the interpretter won't invoke
@@ -385,3 +385,3 @@ class NonSymbol(object):
"""
- assert val and (type(val) == types.StringType), "Must have an actual string."
+ assert val and (type(val) == str), "Must have an actual string."
self.__text = str(val)
@@ -408,3 +408,3 @@ class Line(object):
def __init__(self, num):
- assert ((type(num) == types.IntType) or (type(num) == types.LongType)) and num > 0, "Requires a positive, non-zero integer for a line number"
+ assert ((type(num) == int) or (type(num) == int)) and num > 0, "Requires a positive, non-zero integer for a line number"
self.lineno = num
@@ -692,3 +692,3 @@ def processNonTerminal(ctx, cst):
if (l >= 4):
- if (cst[1][0] == symbol.testlist) and (cst[2][0] == symbol.augassign) and (cst[3][0] == symbol.testlist):
+ if (cst[1][0] == symbol.testlist_star_expr) and (cst[2][0] == symbol.augassign) and (cst[3][0] == symbol.testlist):
# testlist, augassign, testlist
@@ -878,3 +878,3 @@ def walkCst(ctx, cst):
for i in range(len(cst)-1, 0, -1):
- if type(cst[i]) == types.TupleType:
+ if type(cst[i]) == tuple:
# Push it onto the processing stack
@@ -893,3 +893,3 @@ def parseSource(sourcecode, indexbuff, indexbuff_len):
# Parse the source to an Abstract Syntax Tree
- sourcecode = string.replace(sourcecode, '\r\n', '\n')
+ sourcecode = sourcecode.replace('\r\n', '\n')
if sourcecode[-1] != '\n':
@@ -912,3 +912,3 @@ def writeIndex(basepath, indexfn, indexbuff, fnamesbuff):
"""
- fout = open(os.path.join(basepath, indexfn), 'wb')
+ fout = open(os.path.join(basepath, indexfn), 'w', newline='\n')