dgorissen / pycel Goto Github PK
View Code? Open in Web Editor NEWA library for compiling excel spreadsheets to python code & visualizing them as a graph
License: GNU General Public License v3.0
A library for compiling excel spreadsheets to python code & visualizing them as a graph
License: GNU General Public License v3.0
need to detect & bugfix ranges consisting of single cells, e.g. K10:K10
Hi,
This looks like a very nice tool, I am looking forward to see if I can make use of it. One thing I did not manage when I tried the example, is to update a cell, although admin rights were given.
Another thing I was wondering about is a quick way to get the python code. Any easy way I could get around to it you think?
Thanks for sharing pycel,
Petros
PS: I am running the Anaconda 1.9.2 (64-bit) pack with ipython 2.7 on Windows8 with MSOffice13.
Edit:
I started extracting python code from cells, after looking more carefully into the API.
My best result so far was to grab hold of a cell from the dictionary returned by cellmap, a member of class Spreadsheet, in excelcompiler.py.
Then the python code for that cell can be extracted by the method cell2code in class ExcelCompiler, again in excelcompiler.py.
Pseudocode (with respect to the example):
cell_code = c.cell2code(sp.cellmap['Sheet!Cell'])
Updating cells on a loaded graph also works nicely.
I have the next Issue:
AssertionError Traceback (most recent call last)
in
----> 1 excel.set_value('Estados Financieros (1)!F377', 1200)
2 print("D1 is {}".format(excel.evaluate('Resumen (1)!D13')))
~\Anaconda3\lib\site-packages\pycel\excelcompiler.py in set_value(self, address, value, set_as_range)
417 elif address not in self.cell_map:
418 address = AddressRange.create(address).address
--> 419 assert address in self.cell_map
Please help me
In the excellib module, following statement provides list of functions:
FUNCTION_MAP = {
"ln":"xlog",
"min":"xmin",
"min":"xmin",
"max":"xmax",
"sum":"xsum",
"gammaln":"lgamma"
}
Xmin is declared twice and there is no def lgamma()
In addition, list of functions in the module is longer, here are those not used in the code:
def value(text)
def average(_args)
def right(text, n)
def index(_args)
def lookup(value, lookup_range, result_range)
def linest(_args, *_kwargs)
def npv(*args)
Is my understanding correct or am I missing somehting?
Hi,
when I launch the cell compilation (Compile selection) from excel 2007 using the pyxll add-in I get the following error (it pop-up one menu):
—————————
PyXLL Error
—————————
Error calling function for menu item Compile selection
[global name 'excellib' is not defined]
—————————
OK
—————————
I don't know how to solve this problem, which is related with the global/local role of the file excellib. Thanks a lor for the help
If I compute an averageif for the following series: (1,5,3,4,5), I end up with an incorrect result of 4.
I am not sure why this is happening but believe the duplicate 5 is being dropped in the calculation.
Formula:
=(1+gen_z)/(1+InputData!$G$50)-1
Contains an Excel Named Range called 'gen_z'
Throws error:
Exception: Invalid address format gen_z
Any idea on how to fix this?
Have you taken a look at openpyxl for parsing xlsx
files? I realize it can't handle the old xls
format, but I think they've figured out their formula-parsing problems from earlier. Would be nice to remove the COM requirement.
>>> excel.evaluate("Sheet1!A27")
Traceback (most recent call last):
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
ret_val = excel_formula.compiled_lambda()
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined
Eval: sqrt(power(_C_("Sheet1!F37"), 2) + power(_C_("Sheet1!G37"), 2))
Traceback (most recent call last):
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
ret_val = excel_formula.compiled_lambda()
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
ret_val = excel_formula.compiled_lambda()
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
### Traceback will show this line if not loaded from a text file
File "X:\Python\lib\site-packages\pycel\excelcompiler.py", line 619, in _evaluate
value = self.eval(cell.formula)
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 805, in eval_func
exc=FormulaEvalError)
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 749, in error_logger
raise exc(error_msg)
pycel.excelformula.FormulaEvalError: Traceback (most recent call last):
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
ret_val = excel_formula.compiled_lambda()
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined
Eval: sqrt(power(_C_("Sheet1!F37"), 2) + power(_C_("Sheet1!G37"), 2))
Eval: xif(abs(_C_("Sheet1!H37")) < abs(_C_("Sheet1!C37")), 1, 0)
Traceback (most recent call last):
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
ret_val = excel_formula.compiled_lambda()
File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined
I have an excel file (call it "test.xlsx") with the following layout:
Sheet1!A1 -> =NPV(0.1,B1:C1)
Sheet1!B1 -> 10
Sheet1!C1 -> 20
I am attempting to evaluate cell Sheet1!A1 with the following code in test.py
from pycel import ExcelCompiler
excel = ExcelCompiler(filename="test.xlsx")
print("A1 is %s" % excel.evaluate('Sheet1!A1'))
excel.set_value('Sheet1!B1', 200)
print('A1 is now %s' % excel.evaluate('Sheet1!A1'))
This produces a value error I believe caused by passing a range instead of a comma-separated list of cells:
env ❯ python test.py
A1 is 25.6198347107438
A1 is now #VALUE!
I would expect this to evaluate A1 to 198.347 and produce the following console log:
env ❯ python test.py
A1 is 25.6198347107438
A1 is now 198.3471074380165
If I change the formula in cell A1 to =NPV(0.1, B1,C1)
this does evaluate correctly
env ❯ pip freeze --local
decorator==4.4.1
et-xmlfile==1.0.1
jdcal==1.4.1
networkx==2.4
numpy==1.18.1
openpyxl==3.0.3
pycel==1.0b22
python-dateutil==2.8.1
ruamel.yaml==0.16.10
ruamel.yaml.clib==0.2.0
six==1.14.0
env ❯ python -V
Python 3.7.5
>>> import platform
>>> platform.platform()
Darwin-19.3.0-x86_64-i386-64bit'
I've been using Pycel for a few weeks. I am trying to change input on 1 sheet in one workbook, the only thing is that the value doesn't get updated on the sheet that I want to change it on.
Is there any way that I can load multiple sheets in 1 graph?
Cheers
Hi,
thanks for this piece of software!
Execution of LINEST function crashes. Simple test case attached.
Loading test2.xlsx...
Traceback (most recent call last):
File "compileExcel4.py", line 12, in
print("D1 is {}".format(excel.evaluate('Sheet1!D1')))
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 806, in _evaluate_non_iterative
self._gen_graph(address)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 875, in _gen_graph
self._process_gen_graph()
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 887, in _process_gen_graph
self._gen_graph(precedent_address, recursed=True)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 871, in _gen_graph
self._make_cells(seed)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 717, in _make_cells
new_nodes = build_range(excel_data)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 701, in build_range
for addr in a_range.needed_addresses:
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 1031, in needed_addresses
return self.formula and self.formula.needed_addresses or iter(self)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 614, in needed_addresses
if self.python_code:
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 636, in python_code
self._python_code = self.ast.emit
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 442, in emit
return handler()
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 481, in func_linest
degree, coef = get_linest_degree(self.cell)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelutil.py", line 864, in get_linest_degree
address.address_at_offset(row_inc=0, col_inc=i))
AttributeError: 'AddressRange' object has no attribute 'address_at_offset'
D1 is 0.9224
The Excel work book contain a few number in A1:B5 and the LINEST-function
{=LINEST(A1:A5,B1:B5)}
It has been created in Libreoffice and saved in xlsx-format. I am happy to provide it.
from pycel import ExcelCompiler
filename = "test2.xlsx"
print("Loading {}...".format(filename))
# load & compile the file to a graph
excel = ExcelCompiler(filename=filename)
#print("Setting B1 to 0.6")
#excel.set_value('Sheet1!B1', 0.6)
print("D1 is {}".format(excel.evaluate('Sheet1!D1')))
Pycel 1.0b22, Python 3.5.2 and Ubuntu 16.04.
Hello again.
I think I may have run into another small bug and that is when concatenating text with numbers.
the excel formula:
="some text" & A17 & "more text"
where A17 is a number,
is translated to:
'some text' + eval_cell(A17) + 'more text',
which returns an error as is a concatenation of str with int/float.
I worked around this by wrapping the TEXT() function around A17 in the excel version of the formula and mapping text -> str in pycel.
Thanks in advance,
Petros
While the difference may be so great that it would make more sense to start from scratch: What would it take to make pycel work with LibreOffice Calc?
excelformula._parse_to_rpn raise Exception with the formula "= (1,5 * (1 + B11 *B3 ^ B12) + 5) + 10" and worked correctly when I removed the space after the equal sign: "=(1,5 * (1 + B11 *B3 ^ B12) + 5) + 10". Excel works fine in either case.
Just trying to run the example.
Win 7, Excel 2010, Python 2.7.3 |EPD_free 7.3-2 (32-bit)
Message File Name Line Position
Traceback
C:\Research\Test\pycel-master\Test\example.py 22
gen_graph C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 632
cell2code C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 570
shunting_yard C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 477
create_node C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 339
init C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 258
exceptions.NameError: global name 'excellib' is not defined
Any suggestions?
There is a reference to undefined attribute. Should it be ._node
instead?
pycel/src/pycel/excelcompiler.py
Lines 677 to 678 in 146e198
Same in
pycel/src/pycel/excelformula.py
Line 245 in 146e198
Running the example code gives me an error, which could be traced back to the 'ranged names' code in 'excelwrapper.py' :
self.rangednames = np.zeros(shape = (int(self.app.ActiveWorkbook.Names.Count),1), dtype=[('id', 'int_'), ('name', 'S200'), ('formula', 'S200')]) + for i in range(0, self.app.ActiveWorkbook.Names.Count): + self.rangednames[i]['id'] = int(i+1) + self.rangednames[i]['name'] = str(self.app.ActiveWorkbook.Names.Item(i+1).Name) + self.rangednames[i]['formula'] = str(self.app.ActiveWorkbook.Names.Item(i+1).Value)
In many cases at the initial stage the app property hasn't been initialized, so that app is None. Hence, the error :
File "C:\pycel-master\src\pycel\excelwrapper.py", line 30, in __init__ self.rangednames = np.zeros(shape = (int(self.app.ActiveWorkbook.Names.Count),1), dtype=[('id', 'int_'), ('name', 'S200'), ('formula', 'S200')]) AttributeError: 'NoneType' object has no attribute 'ActiveWorkbook'
I have tested this issue on multiple complex models that use iterations and keep coming up against this error.
File "C:\Users\a\.conda\envs\pycel\lib\site-packages\pycel\excelutil.py", line 1317, in done
return (self.ns.iteration_number >= self.ns.iterations or
TypeError: '>=' not supported between instances of 'int' and 'NoneType
Any advice or suggestions would be helpful on why this error occurs,
I got a large xlsx file (621KB) and many excel formulas(mid/vlookup/or something) cross using in it .
Tried to run the example code, and got error msg like:
Traceback (most recent call last):
File "<ipython-input-1-d87f46d59f95>", line 1, in <module>
runfile('C:/Users/Administrator/Desktop/file/example2.py', wdir='C:/Users/Administrator/Desktop/file')
File "d:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "d:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Administrator/Desktop/file/example2.py", line 34, in <module>
print("D1 is %s" % excel.evaluate('S1!C60'))
File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 652, in evaluate
self._gen_graph(address)
File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 685, in _gen_graph
self._process_gen_graph()
File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 697, in _process_gen_graph
self._gen_graph(precedent_address, recursed=True)
File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 681, in _gen_graph
self._make_cells(seed)
File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 572, in _make_cells
new_cells = build_range(address)
File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 551, in build_range
excel_range = self.excel.get_range(rng)
File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelwrapper.py", line 186, in get_range
cells_dataonly = sheet_dataonly[address.coordinate]
File "d:\ProgramData\Anaconda3\lib\site-packages\openpyxl\worksheet\_read_only.py", line 41, in __getitem__
return meth(key)
File "d:\ProgramData\Anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py", line 292, in __getitem__
cols = tuple(self.iter_cols(min_col, max_col))
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'
Which might be caused by readonly mode in openpyxl.
How to fix this? Say, turn to writeonly mode?
All the calculation has been set up using the MS Excel interface ,I just want to try change some numbers and regenerate the particular Cells to get the output. I.E. to use the xlsx file as a calculate engine.
I might try to edit the xlsx data and use win32com to open the xlsx file and refresh the output cell, But it is a little slow rather than 50ms.
Any suggestions?
File "/Users/andrei/parsers/xlsx.py", line 141, in _load_pycel_workbook
return ExcelCompiler(filename)
File "/Users/andrei/venv/lib/python3.8/site-packages/pycel/excelcompiler.py", line 95, in __init__
wb_cycles = bool(self.excel.workbook.calculation.iterate)
AttributeError: 'NoneType' object has no attribute 'iterate'
It should work.
When a workbook package is missing calcPr
property then ExcelCompiler is failing to process the document.
In the attached test file, the vlookup formula in G4 in excel returns 20, but pycel returns the last value found of 50.
In the event an iferror formula does not have a second argument but has a valid first argument, I am getting a TypeError: iferror() missing 1 required positional argument: 'value_if_error' Eval: iferror(_C_("Sheet1!A1") / _C_("Sheet1!A2"))
For e.g. a simple formula such as =iferror(12/3,) which works in excel will result in the above error.
If a range containing an empty cell or cells were to be passed to either COUNT or COUNTIF, I get a TypeError as follows:
TypeError: unorderable types: int() >= NoneType()
Line 990 in 2c4d198
Traceback (most recent call last):
File "\PyCel\venv\lib\site-packages\pycel\excellib.py", line 190, in date
year, month, day = normalize_year(year, month, day)
File "\PyCel\venv\lib\site-packages\pycel\excelutil.py", line 913, in normalize_year
days_in_month = get_max_days_in_month(m, y)
File "\PyCel\venv\lib\site-packages\pycel\excelutil.py", line 892, in get_max_days_in_month
return calendar.monthrange(year, month)[1]
File "\bin\python3.7\lib\calendar.py", line 124, in monthrange
raise IllegalMonthError(month)
calendar.IllegalMonthError: bad month number 0; must be 1-12
Please have a look at:
I tried to run the pycel example on binder but it gave an error ``NameError: name 'c' is not defined`. I think the example is missing some parts.
best,
Miguel
Microsoft Excel currently includes functionality to perform iterative calculations when formulas contain circular references. By default further processing of circular formulas is broken after 100 iterations or once the value of the target cell changes by less than 0.01.
Current evaluation of pycel on formula with circular references does not take number of iterations or minimum value change into account during its execution. The result is an infinite cyclical execution that only breaks once the maximum recursion depth is reached.
Would there be any way to add this functionality or otherwise provide direction to proper execution state locations so that this functionality can be added via PR?
Thanks for the incredible library!
@thecapacity In a72203a you change the GPL-v3 licensing of this project to Public-Domain.
IANAL, but I believe you are not entitled to that, although your motives are kind.
My understanding is this:
LICENSE
file.Practically, the project has to remain under GPLv3 and not under the public domain.
Dirk, would you be OK with re-licensing the parsing and tokenizing code under a license which is compatible with openpyxl?
The excel formula result of SUM()/COUNTIFS
is incorrect for specific case:
=SUM(B15*Configuration!B4,B16*Configuration!B5,B17*Configuration!B6,B18*Configuration!B7,B19*Configuration!B8)
=COUNTIFS(Result!J:J,"Highest",Result!E:E,"fail")
The excel formula result should be -24
instead of 0
ExcelCompiler
evaluate()
from pycel import ExcelCompiler
cp = ExcelCompiler(filename=PATH("data/TestData.xlsx"))
print("Report!B8: ", cp.evaluate('Report!B8'))
print("Report!B23: ", cp.evaluate('Report!B23'))
print("Report!B25: ", cp.evaluate('Report!B25'))
windows7 64bit
pycel: 1.0b16
python: 3.6.3
I have a 2.11MB of Excel file. When using pycel for evaluating the formula. The script's RAM consumption is 4GB+.
When I evaluate a ~400kb's Excel file the RAM consumption is just 200MB+
I would like to know how does evaluate
work. Does it read through every single cell?
How can I prevent high RAM consumption when evaluating a formula?
Hello,
I returned to using pycel recently and noticed that I get wrong results in this example:
excel formula:
=IF(A1<=3;0;1)
where A1 is an empty cell.
The above formula will return 0 in microsoft excel, as unacceptable as it may be. My guess is that excel treats empty cells as 0 and not as NULL (None in Python).
Pycel, like any sane program, will assign None value to A1 and the above formula will return 1.
Unfortunately, the excel version is the accepted behavior and I need to locate in pycel, where an empty cell is translated into a None value and turn that into 0.
Any pointers to where I should start looking?
Thanks for any suggestions,
Petros
I am trying to see if there is an easy way to get a list of all dependencies for a particular cell. I understand I could view the graph, but I'd like to get the dependencies in a linear form (list, dictionary, tuple etc.) for further analysis.
Which function can I call in the ExcelCompiler class to obtain this for a particular cell?
I tried using the cell_map function but I don't believe it lays out ALL the dependencies for a given cell. From what I can tell, its simply displaying the formula in a given cell.
I see there is a needed_addresses function in the _Cell class, however I am not clear how to access that function from the ExcelCompiler instance.
Any help would be appreciated!
Dear dgorissen,
I'm a fan of your excellent work with Pycel! I am interested in contributing to the codebase, first by creating a version of the AVERAGEIFS function in Excel. Would you accept such a pull request? Where would you recommend getting started? Thanks!
Hello,
New user of python. I am trying to run the example given with pycel.
I set up an environment with Python 2.7, install pycell, network, numpy, matplotlip dependencies.
In a Jupyter notebook, when I run:
print "Compiling..., starting from D1"
sp = c.gen_graph('D1',sheet='Sheet1')
I get the error:
Compiling..., starting from D1
Seed D1 expanded into 1 cells
1 filtered seeds
1 cells on the todo list
Handling Sheet1!D1
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-bf6729dbb0ea> in <module>()
1 print("Compiling..., starting from D1")
----> 2 sp = c.gen_graph('D1', sheet='Sheet1')
C:\[...]anaconda3\envs\pycel\lib\site-packages\pycel-0.0.1-py2.7.egg\pycel\excelcompiler.pyc in gen_graph(self, seed, sheet)
639
640 # parse the formula into code
--> 641 pystr,ast = self.cell2code(c1)
642
643 # set the code & compile it (will flag problems sooner rather than later)
C:\[...]\anaconda3\envs\pycel\lib\site-packages\pycel-0.0.1-py2.7.egg\pycel\excelcompiler.pyc in cell2code(self, cell)
578 if cell.formula:
579 e = shunting_yard(cell.formula or str(cell.value))
--> 580 ast,root = build_ast(e)
581 code = root.emit(ast,context=Context(cell,self.excel))
582 else:
C:\[...]anaconda3\envs\pycel\lib\site-packages\pycel-0.0.1-py2.7.egg\pycel\excelcompiler.pyc in build_ast(expression)
540 # G.add_edge(stack.pop(),n)
541 else:
--> 542 G.add_node(n,{'pos':0})
543
544 stack.append(n)
TypeError: add_node() takes exactly 2 arguments (3 given)
Sounds very basic, but I can't figure out what is wrong? Not correct version of networkx?
Pycel raises a exception on rectangular intervals if they are written as e.g., Sheet1!A1:Sheet1!A9
instead of Sheet1!A1:A9
. Both are exactly the same interval.
Both intervals should nor raise exceptions,since they are the same.
Pycel is interpreting intervals defined as e.g. Sheet1!A1:Sheet1!A9
as non rectangular. They are rectangular if the sheet reference is the same on both ends of the range.
I attached a test file Test Interval.xlsx to repeat the problem.
The code to reproduce the problem is
from pycel import ExcelCompiler
filename = 'Test Interval.xlsx'
model = ExcelCompiler(filename)
# This is a rectangular formula
model.evaluate('Sheet2!A1')
My traceback was
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-1-35ef567d212e> in <module>
5
6 # This is a rectangular formula
----> 7 model.evaluate('Sheet2!A1')
c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in _evaluate_non_iterative(self, address)
804
805 if address.address not in self.cell_map:
--> 806 self._gen_graph(address)
807
808 result = self._evaluate(str(address))
c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in _gen_graph(self, seed, recursed)
873 if not recursed:
874 # if not entered to process one cell / cellrange process other work
--> 875 self._process_gen_graph()
876
877 def _process_gen_graph(self):
c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in _process_gen_graph(self)
883 self.log.debug("Handling {}".format(dependant.address))
884
--> 885 for precedent_address in dependant.needed_addresses:
886 if precedent_address.address not in self.cell_map:
887 self._gen_graph(precedent_address, recursed=True)
c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in needed_addresses(self)
1069 @property
1070 def needed_addresses(self):
-> 1071 return self.formula and self.formula.needed_addresses or ()
1072
1073
c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in needed_addresses(self)
612 if self._needed_addresses is None:
613 # get all the cells/ranges this formula refers to, and remove dupes
--> 614 if self.python_code:
615 code = iter((self.python_code.encode(),))
616 tokens = tuple(tk.tokenize(lambda: next(code)))
c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in python_code(self)
634 self._python_code = ''
635 else:
--> 636 self._python_code = self.ast.emit
637 return self._python_code
638
c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in emit(self)
445 # map to the correct name
446 return "{}({})".format(
--> 447 self.func_map.get(func, func), self.comma_join_emit())
448
449 @staticmethod
c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in comma_join_emit(self, fmt_str, to_emit)
422 to_emit = self.children
423 if fmt_str is None:
--> 424 return ", ".join(n.emit for n in to_emit)
425 else:
426 return ", ".join(
c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in <genexpr>(.0)
422 to_emit = self.children
423 if fmt_str is None:
--> 424 return ", ".join(n.emit for n in to_emit)
425 else:
426 return ", ".join(
c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in emit(self)
344 @property
345 def emit(self):
--> 346 return self._emit()
347
348 def _emit(self, value=None):
c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in _emit(self, value)
354 try:
355 addr_str = value.replace('$', '')
--> 356 address = AddressRange.create(addr_str, sheet=sheet, cell=self.cell)
357 except ValueError:
358 # check for table relative address
c:\users\rodrigo\documents\github\pycel\src\pycel\excelutil.py in create(cls, address, sheet, cell)
344 return address
345
--> 346 sheetname, addr = split_sheetname(address, sheet=sheet)
347 addr_tuple, sheetname = range_boundaries(
348 addr, sheet=sheetname, cell=cell)
c:\users\rodrigo\documents\github\pycel\src\pycel\excelutil.py in split_sheetname(address, sheet)
547 sh, address_part = address.split('!', maxsplit=1)
548 if '!' in address_part:
--> 549 raise NotImplementedError(
550 "Non-rectangular formulas: {}".format(address))
551 sh = unquote_sheetname(sh)
NotImplementedError: Non-rectangular formulas: Sheet1!A1:'Sheet1'!A9
I ran into tokenize.TokenError: ('EOF in multi-line statement', (2, 0))
when trying to evaluate cells in a large workbook. After some detective work I narrowed it down to a cell containing this formula:
=SUMIF(B12:B15, ">""", C12:C15)
The purpose of this formula is to only add up cells in the C-range whose corresponding cells in the B-range contain a non-whitespace string. Here is a minimal reproducible example spreadsheet:
Cell C1 contains the formula =SUMIF(A1:A5, ">""", B1:B5)
. Running this code:
from pycel import ExcelCompiler
fname = "test.xlsx"
excel = ExcelCompiler(filename=fname)
print(excel.evaluate("Sheet1!C1"))
I get the traceback:
$ "c:/Users/ckp95/Documents/pycel_test/.venv/Scripts/python.exe" "c:/Users/ckp95/Documents/pycel_test/foo.py"
Traceback (most recent call last):
File "c:/Users/ckp95/Documents/pycel_test/foo.py", line 6, in <module>
print(excel.evaluate("Sheet1!C1"))
File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 808, in _evaluate_non_iterative
self._gen_graph(address)
File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 877, in _gen_graph
self._process_gen_graph()
File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 887, in _process_gen_graph
for precedent_address in dependant.needed_addresses:
File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 1073, in needed_addresses
return self.formula and self.formula.needed_addresses or ()
File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelformula.py", line 616, in needed_addresses
tokens = tuple(tk.tokenize(lambda: next(code)))
File "C:\Users\ckp95\.pyenv\pyenv-win\versions\3.7.2\lib\tokenize.py", line 579, in _tokenize
raise TokenError("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (2, 0))
I'm guessing this is because the parser doesn't know how to deal with escaped double-quotes.
It should print 6
.
(Also it would be nice if the TokenError
traceback told me which cell was the problem. For the original sheet I had to add print debug statements into the excelformula.py
file to figure out which intermediate cell in the calculation chain was triggering it)
OS: Windows 10
Python version: 3.7.2
Pycel version: 1.0b22
Hi,
I am thinking of using pycel for a excel-to-python conversion model. Using python 2.7, in order to make sure everything is compatible, I ran into a few classic foreign character issues:
i.e.
ValueError 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
To make play nice, I have restricted my sheet names to be standard latin-character based and made the following changes to excelutil.py, to enable foreign characters to cells:
@@ -81,15 +81,15 @@ class Cell(object):
(-) self.__sheet = str(sheet)
(-) self.__formula = str(formula) if formula else None
(+) self.__sheet = sheet.encode('utf-8')
(+) self.__formula = formula.encode('utf-8') if formula else None
self.__sheet = sh
self.__col = c
self.__row = int(r)
self.__col_idx = col2num(c)
( -) self.value = str(value) if isinstance(value,unicode) else value
(+) self.value = value.encode('utf-8') if isinstance(value,unicode) else value
self.python_expression = None
self._compiled_expression = None
Now it seems like the gen_graph and evaluation work nicely, but creating a graph from the derived object still fails.
Thanks,
Petros
Please note that Pycel deals with a file with hundreds of formula very well until:
I get to this cell called InputData!H50 the Excel is as follow:
=IFERROR(IF(year_modelStart>=year_baseCosts;2%;AVERAGE(L50:OFFSET(K50;0;MATCH(year_baseCosts;CA_Years;0))));2%)
New RangedNames feature deals with it properly and delivers to Python:
=IFERROR(IF('InputData'!$G$14>='InputData'!$G$15,2%,AVERAGE(L50:OFFSET(K50,0,MATCH('InputData'!$G$15,'InputData'!$L$5:$DG$5,0)))),2%)
So far so good but at some point I hit the wall in the emit function:
ss = args[0].emit(ast,context=context) + op + args[1].emit(ast,context=context)
IndexError, list out of range
With basically args[1].emit(ast,context=context) being out of range for some reason I cannot understand (it has passed dozens of if_error and it never called the def if_error function I created).
Not to be deterred, I add to my code the following try/except to by-pass eventual index issues (not a great idea when dealing with great code I'd suppose, but can one really resist looking over the hill?):
try:
ss = args[0].emit(ast,context=context) + op + args[1].emit(ast,context=context)
except (IndexError):
ss = args[0].emit(ast,context=context) + op
Without surprise it comes back to haunt me:
Exception: Failed to compile cell InputData!H50 with expression iferror((2% if eval_cell("InputData!G14")>=(eval_cell("InputData!G15") if eval_cell("InputData!G15") is not None else float('inf')) else average(l50:offset(eval_cell("InputData!K50"),0,match(eval_cell("InputData!G15"),eval_range("InputData!L5:DG5"),0)))),2%): invalid syntax (, line 1)
Any idea where to start?
I added support for the PV function (https://support.office.com/en-us/article/pv-function-23879d31-0e02-4321-be01-da16e8168cbd) to pycel and want to contribute the changes back to the repository.
Facing some errors with MATCH and VLOOKUP in the attached sheet.
If I set value for C11 to "Arena":
processing a xlsx file like this
a f a f
b g
c h
d i
e j
The 'f' on sheet1!D1=vlookup(C1,A:B,2,FALSE).
And got
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b6-py3.6.egg\pycel\excelcompiler.py", line 557, in build_range
if cell_address.address not in self.cell_map:
AttributeError: 'str' object has no attribute 'address'
It seems that it cannot run the str formula, but it do support something like A1&B1, (the & connector in Excel).
Is there any ways to support all the formula in Excel ?
If it's impossible then just add some description in the readme.
Not sure if you can address this or if it's something you can just note in the readme but Networkx 2.1 (which is the default install by pip these days) is incompatible with how you add nodes to graph objects. In build_ast function you pass two arguments when adding a node to the graph (G.add_node(arg1,{'pos':1})) but this is no longer supported by Networkx 2.1 and throws errors. To get your package to work I just installed networkx 1.8 so I just wanted to let you know that (if you didn't know already and for others trying this awesome package out).
I'm not the strongest in programming so it might be my personal issue, but it does not seem to work. I'm trying to run example.py (in pycel/src folder) and I keep running into errors.
Mostly things like this:
File "C:\Python32\lib\site-packages\openpyxl-2.5.0a1-py3.2.egg\openpyxl\utils_init.py", line 5, in
from .cell import (
File "C:\Python32\lib\site-packages\openpyxl-2.5.0a1-py3.2.egg\openpyxl\utils\cell.py", line 9, in
from openpyxl.compat import str
ImportError: cannot import name str
Following the error to the _C:\Python32\Lib\site-packages\openpyxl-2.5.0a1-py3.2.egg\openpyxl\compat, it comes out there is no such file as str but "strings.py". And I keep running into similar errors.
Additionally, not as a problem, but as a question, can it only deal with .xlsx or could I convert .xlsm (macro enabled) formats with that as well?
Edit: Totally my own problem, I was using python3 and thought that compiling each file by using 2to3 would be enough. Naive me :D
Any interest in creating a pandas based version?
COUNT function doesn't work for ranges i.e = COUNT(A1:A5). It only works if the formula is written as =COUNT(A1,A2,A3,A4,A5).
Excel can compute ranges so not sure if this is intended behavior or not.
Openpyxl ==2.5.3
pycel == 1.0b5
python==3.6.5
Run the example.py in pycel.
First, It says:
Traceback (most recent call last):
File "<ipython-input-1-3e8fdd08e02b>", line 1, in <module>
runfile('D:/pycel-master/pycel-master/example/example.py', wdir='D:/pycel-master/pycel-master/example')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/pycel-master/pycel-master/example/example.py", line 9, in <module>
from pycel import ExcelCompiler
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\__init__.py", line 1, in <module>
from .excelcompiler import ExcelCompiler
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 16, in <module>
from pycel.excelwrapper import ExcelOpxWrapper
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelwrapper.py", line 15, in <module>
from openpyxl.cell.cell import Cell, TYPE_FORMULA
ImportError: cannot import name 'TYPE_FORMULA'
Then I fixed it by declare:
TYPE_FORMULA = 'f'
in excelwrapper.py
But got another error:
Loading D:/pycel-master/pycel-master/example\example.xlsx...
C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\worksheet.py:318: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
Traceback (most recent call last):
File "<ipython-input-2-3e8fdd08e02b>", line 1, in <module>
runfile('D:/pycel-master/pycel-master/example/example.py', wdir='D:/pycel-master/pycel-master/example')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/pycel-master/pycel-master/example/example.py", line 34, in <module>
print("D1 is %s" % excel.evaluate('Sheet1!D1'))
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 652, in evaluate
self._gen_graph(address)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 681, in _gen_graph
self._make_cells(seed)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 574, in _make_cells
new_cells = [build_cell(address)]
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 530, in build_cell
excel_range = self.excel.get_range(addr)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelwrapper.py", line 178, in get_range
self.from_excel):
File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1227, in __enter__
self.target = self.getter()
File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1397, in <lambda>
getter = lambda: _importer(target)
File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1084, in _importer
thing = _dot_lookup(thing, comp, import_path)
File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1073, in _dot_lookup
__import__(import_path)
ModuleNotFoundError: No module named 'openpyxl.worksheet._reader'
Which I don't know how to fix.
It seems to be caused by the dismatch between openpyx and pycel.
How to find a proper one? Or fix the remain problems?
I have separated this issue from previous on Named Ranges owing to the need of structural changes to Pycell prior to overcome this.
Some named ranges specified in Office Open XML cannot be read from win32.com
E.g. take the name 'test' which formula is '$A'
This means that it refers to the cell in column $A on the same line where the cell in which this name is used. E.g. if 'test' representing '$A' is used in B4 then $A means $A4, if 'test' representing '$A' is used in G5 then 'test' means '$A5', if 'test' representing '$A' is used in ZZ4 and ZZ5 then '$A' means '$A4' and '$A5' respectively.
When read from the NameManager in Excel, it will appear as $A + the line in which the cursor was when the file was last saved and not $A. Say the cursor was on AB18 then it will appear as $A18. Using win32.com it will not even appear at all in the list of named ranges.
This implies that structurally PyCell must evolve from accessing the COM of Excel to reading the underlying XML. There is a program called openpyxl that just does that. I saw from the log of discussions on Pycell that a merger of the programs was being discussed. Did this proceed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.