dgorissen / pycel Goto Github PK

View Code? Open in Web Editor NEW

563.0 563.0 147.0 2.17 MB

A library for compiling excel spreadsheets to python code & visualizing them as a graph

License: GNU General Public License v3.0

Python 100.00%

pycel's People

Contributors

Stargazers

Watchers

Forkers

nandigama kfk chrispsn arnabkd ddjokic willu47 johndharrison joe-dunford crsf29 imclab nmearl petroslamb dtpryce phrrngtn ageek yevrah gdementen priestd09 fscottfoti bgola anthill jffernandez den-run-ai zealotous fillll anehrgeiz 18f puneet-shivanand linearregression jmv0 adverley engineerchuan nickhogan geneyoung sanbales mrsteepere hellumellu datatalking suchibrata mattakruger omalleyt12 jptrinastic bibsian dealstat ndirangu13 mitchelllisle winniexf jabellcu dtorres-sf kmader bewernick tjsums iammosespaulr deltaresprojects stephenrauch geneopt adoubbi marknais mikpim01 fabiocs8 girgitt marco-lavagnino mingmingtao danielsjf sandy4321 asitang geojosh skisource nanaposo gitcontainer piti118 elhakimz thotegt andreif maraininc wpf0221 betelguez dror-g shalevy1 chewmag rdmolony avik1993 rmorel organizationusername ckp95 nboukraa jadevinicole sabirimanov victorjmarin luckykarter sjoerdteunisse shingayi-bamhare qibo-cn mhumpher md11235 tyrbonit bogdan-oprescu-nxp maroudja sjanibekova amankagarwal

pycel's Issues

single cell ranges (small bug)

need to detect & bugfix ranges consisting of single cells, e.g. K10:K10

About getting the python code

Hi,

This looks like a very nice tool, I am looking forward to see if I can make use of it. One thing I did not manage when I tried the example, is to update a cell, although admin rights were given.

Another thing I was wondering about is a quick way to get the python code. Any easy way I could get around to it you think?

Thanks for sharing pycel,
Petros

PS: I am running the Anaconda 1.9.2 (64-bit) pack with ipython 2.7 on Windows8 with MSOffice13.

Edit:

I started extracting python code from cells, after looking more carefully into the API.

My best result so far was to grab hold of a cell from the dictionary returned by cellmap, a member of class Spreadsheet, in excelcompiler.py.
Then the python code for that cell can be extracted by the method cell2code in class ExcelCompiler, again in excelcompiler.py.

Pseudocode (with respect to the example):

cell_code = c.cell2code(sp.cellmap['Sheet!Cell'])

Updating cells on a loaded graph also works nicely.

AssertionError during set_value()

I have the next Issue:

AssertionError Traceback (most recent call last)
in
----> 1 excel.set_value('Estados Financieros (1)!F377', 1200)
2 print("D1 is {}".format(excel.evaluate('Resumen (1)!D13')))

~\Anaconda3\lib\site-packages\pycel\excelcompiler.py in set_value(self, address, value, set_as_range)
417 elif address not in self.cell_map:
418 address = AddressRange.create(address).address
--> 419 assert address in self.cell_map

Please help me

excellib module functions mapping incomplete

In the excellib module, following statement provides list of functions:

FUNCTION_MAP = {
"ln":"xlog",
"min":"xmin",
"min":"xmin",
"max":"xmax",
"sum":"xsum",
"gammaln":"lgamma"
}

Xmin is declared twice and there is no def lgamma()

In addition, list of functions in the module is longer, here are those not used in the code:
def value(text)
def average(_args)
def right(text, n)
def index(_args)
def lookup(value, lookup_range, result_range)
def linest(_args, *_kwargs)
def npv(*args)

Is my understanding correct or am I missing somehting?

Error when launching pycel from pyxll add-in

Hi,
when I launch the cell compilation (Compile selection) from excel 2007 using the pyxll add-in I get the following error (it pop-up one menu):

—————————
PyXLL Error
—————————
Error calling function for menu item Compile selection

[global name 'excellib' is not defined]
—————————
OK
—————————

I don't know how to solve this problem, which is related with the global/local role of the file excellib. Thanks a lor for the help

AVERAGEIF not computing for duplicates

If I compute an averageif for the following series: (1,5,3,4,5), I end up with an incorrect result of 4.

I am not sure why this is happening but believe the duplicate 5 is being dropped in the calculation.

split_address doesn't manages Excel NamedRanges

Formula:

=(1+gen_z)/(1+InputData!$G$50)-1

Contains an Excel Named Range called 'gen_z'

Throws error:

Exception: Invalid address format gen_z

Any idea on how to fix this?

File Based Compilation

Have you taken a look at openpyxl for parsing xlsx files? I realize it can't handle the old xls format, but I think they've figured out their formula-parsing problems from earlier. Would be nice to remove the COM requirement.

name power is not defined

>>> excel.evaluate("Sheet1!A27")
Traceback (most recent call last):
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
    ret_val = excel_formula.compiled_lambda()
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
    ### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined
Eval: sqrt(power(_C_("Sheet1!F37"), 2) + power(_C_("Sheet1!G37"), 2))
Traceback (most recent call last):
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
    ret_val = excel_formula.compiled_lambda()
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
    ### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
    ret_val = excel_formula.compiled_lambda()
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
    ### Traceback will show this line if not loaded from a text file
  File "X:\Python\lib\site-packages\pycel\excelcompiler.py", line 619, in _evaluate
    value = self.eval(cell.formula)
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 805, in eval_func
    exc=FormulaEvalError)
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 749, in error_logger
    raise exc(error_msg)
pycel.excelformula.FormulaEvalError: Traceback (most recent call last):
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
    ret_val = excel_formula.compiled_lambda()
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
    ### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined
Eval: sqrt(power(_C_("Sheet1!F37"), 2) + power(_C_("Sheet1!G37"), 2))
Eval: xif(abs(_C_("Sheet1!H37")) < abs(_C_("Sheet1!C37")), 1, 0)
Traceback (most recent call last):
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 801, in eval_func
    ret_val = excel_formula.compiled_lambda()
  File "X:\Python\lib\site-packages\pycel\excelformula.py", line 818, in <lambda>
    ### Traceback will show this line if not loaded from a text file
NameError: name 'power' is not defined

NPV function fails when passed range of cashflows

What actually happened

I have an excel file (call it "test.xlsx") with the following layout:

Sheet1!A1 -> =NPV(0.1,B1:C1)
Sheet1!B1 -> 10
Sheet1!C1 -> 20

I am attempting to evaluate cell Sheet1!A1 with the following code in test.py

  from pycel import ExcelCompiler
  excel = ExcelCompiler(filename="test.xlsx")
  print("A1 is %s" % excel.evaluate('Sheet1!A1'))
  excel.set_value('Sheet1!B1', 200)
  print('A1 is now %s' % excel.evaluate('Sheet1!A1'))

This produces a value error I believe caused by passing a range instead of a comma-separated list of cells:

env ❯ python test.py
A1 is 25.6198347107438
A1 is now #VALUE!

What was expected to happen

I would expect this to evaluate A1 to 198.347 and produce the following console log:

env ❯ python test.py
A1 is 25.6198347107438
A1 is now 198.3471074380165

If I change the formula in cell A1 to =NPV(0.1, B1,C1) this does evaluate correctly

Environment

env ❯ pip freeze --local
decorator==4.4.1
et-xmlfile==1.0.1
jdcal==1.4.1
networkx==2.4
numpy==1.18.1
openpyxl==3.0.3
pycel==1.0b22
python-dateutil==2.8.1
ruamel.yaml==0.16.10
ruamel.yaml.clib==0.2.0
six==1.14.0

env ❯ python -V
Python 3.7.5

>>> import platform
>>> platform.platform()
Darwin-19.3.0-x86_64-i386-64bit'

Generating 1 graph for multiple sheets

I've been using Pycel for a few weeks. I am trying to change input on 1 sheet in one workbook, the only thing is that the value doesn't get updated on the sheet that I want to change it on.

Is there any way that I can load multiple sheets in 1 graph?

Cheers

LINEST function fails w/ Workbook created in LibreOffice

Hi,
thanks for this piece of software!

Problem description

Execution of LINEST function crashes. Simple test case attached.

What actually happened

Loading test2.xlsx...
Traceback (most recent call last):
File "compileExcel4.py", line 12, in
print("D1 is {}".format(excel.evaluate('Sheet1!D1')))
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 806, in _evaluate_non_iterative
self._gen_graph(address)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 875, in _gen_graph
self._process_gen_graph()
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 887, in _process_gen_graph
self._gen_graph(precedent_address, recursed=True)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 871, in _gen_graph
self._make_cells(seed)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 717, in _make_cells
new_nodes = build_range(excel_data)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 701, in build_range
for addr in a_range.needed_addresses:
File "/usr/local/lib/python3.5/dist-packages/pycel/excelcompiler.py", line 1031, in needed_addresses
return self.formula and self.formula.needed_addresses or iter(self)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 614, in needed_addresses
if self.python_code:
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 636, in python_code
self._python_code = self.ast.emit
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 442, in emit
return handler()
File "/usr/local/lib/python3.5/dist-packages/pycel/excelformula.py", line 481, in func_linest
degree, coef = get_linest_degree(self.cell)
File "/usr/local/lib/python3.5/dist-packages/pycel/excelutil.py", line 864, in get_linest_degree
address.address_at_offset(row_inc=0, col_inc=i))
AttributeError: 'AddressRange' object has no attribute 'address_at_offset'

What was expected to happen

D1 is 0.9224

Code Sample

The Excel work book contain a few number in A1:B5 and the LINEST-function

{=LINEST(A1:A5,B1:B5)}

It has been created in Libreoffice and saved in xlsx-format. I am happy to provide it.

from pycel import ExcelCompiler

filename = "test2.xlsx"
print("Loading {}...".format(filename))

# load & compile the file to a graph
excel = ExcelCompiler(filename=filename)

#print("Setting B1 to 0.6")
#excel.set_value('Sheet1!B1', 0.6)

print("D1 is {}".format(excel.evaluate('Sheet1!D1')))

Environment

Pycel 1.0b22, Python 3.5.2 and Ubuntu 16.04.

concatenating str with int/float error

Hello again.

I think I may have run into another small bug and that is when concatenating text with numbers.

the excel formula:

="some text" & A17 & "more text"
where A17 is a number,

is translated to:

'some text' + eval_cell(A17) + 'more text',

which returns an error as is a concatenation of str with int/float.

I worked around this by wrapping the TEXT() function around A17 in the excel version of the formula and mapping text -> str in pycel.

Thanks in advance,
Petros

What would it take to make this work with libreoffice as well?

While the difference may be so great that it would make more sense to start from scratch: What would it take to make pycel work with LibreOffice Calc?

excelformula._parse_to_rpn crashes with space after "=" in formula

excelformula._parse_to_rpn raise Exception with the formula "= (1,5 * (1 + B11 *B3 ^ B12) + 5) + 10" and worked correctly when I removed the space after the equal sign: "=(1,5 * (1 + B11 *B3 ^ B12) + 5) + 10". Excel works fine in either case.

NameError: global name 'excellib' is not defined

Just trying to run the example.
Win 7, Excel 2010, Python 2.7.3 |EPD_free 7.3-2 (32-bit)

Message File Name Line Position
Traceback
C:\Research\Test\pycel-master\Test\example.py 22
gen_graph C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 632
cell2code C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 570
shunting_yard C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 477
create_node C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 339
init C:\Python 27 32\lib\site-packages\pycel\excelcompiler.py 258
exceptions.NameError: global name 'excellib' is not defined

Any suggestions?

AttributeError: 'DiGraph' object has no attribute 'node'

There is a reference to undefined attribute. Should it be ._node instead?

pycel/src/pycel/excelcompiler.py

Lines 677 to 678 in 146e198

 self.dep_graph.node[node]['sheet'] = node.sheet 

 self.dep_graph.node[node]['label'] = node.address.coordinate

Same in

pycel/src/pycel/excelformula.py

Line 245 in 146e198

self._children = sorted(args, key=lambda x: self.ast.node[x]['pos'])

Ranged names code breaks the running of the example code

Running the example code gives me an error, which could be traced back to the 'ranged names' code in 'excelwrapper.py' :

  self.rangednames = np.zeros(shape = (int(self.app.ActiveWorkbook.Names.Count),1), dtype=[('id', 'int_'), ('name', 'S200'), ('formula', 'S200')])
+        for i in range(0, self.app.ActiveWorkbook.Names.Count):
+            self.rangednames[i]['id'] = int(i+1)       
+            self.rangednames[i]['name'] = str(self.app.ActiveWorkbook.Names.Item(i+1).Name)        
+            self.rangednames[i]['formula'] = str(self.app.ActiveWorkbook.Names.Item(i+1).Value)

In many cases at the initial stage the app property hasn't been initialized, so that app is None. Hence, the error :

   File "C:\pycel-master\src\pycel\excelwrapper.py", line 30, in __init__
    self.rangednames = np.zeros(shape = (int(self.app.ActiveWorkbook.Names.Count),1), dtype=[('id', 'int_'), ('name', 'S200'), ('formula', 'S200')])

AttributeError: 'NoneType' object has no attribute 'ActiveWorkbook'

TypeError on iterative models

I have tested this issue on multiple complex models that use iterations and keep coming up against this error.

  File "C:\Users\a\.conda\envs\pycel\lib\site-packages\pycel\excelutil.py", line 1317, in done
    return (self.ns.iteration_number >= self.ns.iterations or
TypeError: '>=' not supported between instances of 'int' and 'NoneType

Any advice or suggestions would be helpful on why this error occurs,

While Process a large xlsx , get 'ReadOnlyWorksheet' object has no attribute 'iter_cols' from Openpyxl

I got a large xlsx file (621KB) and many excel formulas(mid/vlookup/or something) cross using in it .
Tried to run the example code, and got error msg like:

Traceback (most recent call last):

  File "<ipython-input-1-d87f46d59f95>", line 1, in <module>
    runfile('C:/Users/Administrator/Desktop/file/example2.py', wdir='C:/Users/Administrator/Desktop/file')

  File "d:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "d:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Administrator/Desktop/file/example2.py", line 34, in <module>
    print("D1 is %s" % excel.evaluate('S1!C60'))

  File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 652, in evaluate
    self._gen_graph(address)

  File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 685, in _gen_graph
    self._process_gen_graph()

  File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 697, in _process_gen_graph
    self._gen_graph(precedent_address, recursed=True)

  File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 681, in _gen_graph
    self._make_cells(seed)

  File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 572, in _make_cells
    new_cells = build_range(address)

  File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 551, in build_range
    excel_range = self.excel.get_range(rng)

  File "d:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelwrapper.py", line 186, in get_range
    cells_dataonly = sheet_dataonly[address.coordinate]

  File "d:\ProgramData\Anaconda3\lib\site-packages\openpyxl\worksheet\_read_only.py", line 41, in __getitem__
    return meth(key)

  File "d:\ProgramData\Anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py", line 292, in __getitem__
    cols = tuple(self.iter_cols(min_col, max_col))

AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'

Which might be caused by readonly mode in openpyxl.
How to fix this? Say, turn to writeonly mode?
All the calculation has been set up using the MS Excel interface ,I just want to try change some numbers and regenerate the particular Cells to get the output. I.E. to use the xlsx file as a calculate engine.
I might try to edit the xlsx data and use win32com to open the xlsx file and refresh the output cell, But it is a little slow rather than 50ms.
Any suggestions?

Fails due to missing `calcPr` property.

What actually happened

  File "/Users/andrei/parsers/xlsx.py", line 141, in _load_pycel_workbook
    return ExcelCompiler(filename)
  File "/Users/andrei/venv/lib/python3.8/site-packages/pycel/excelcompiler.py", line 95, in __init__
    wb_cycles = bool(self.excel.workbook.calculation.iterate)
AttributeError: 'NoneType' object has no attribute 'iterate'

What was expected to happen

It should work.

Problem description

When a workbook package is missing calcPr property then ExcelCompiler is failing to process the document.

VLOOKUP returns last value found

In the attached test file, the vlookup formula in G4 in excel returns 20, but pycel returns the last value found of 50.

TestModel_v2.xlsx

_

TypeError with iferror in special case

In the event an iferror formula does not have a second argument but has a valid first argument, I am getting a TypeError: iferror() missing 1 required positional argument: 'value_if_error' Eval: iferror(_C_("Sheet1!A1") / _C_("Sheet1!A2"))

For e.g. a simple formula such as =iferror(12/3,) which works in excel will result in the above error.

COUNTIF and COUNTIFS

If a range containing an empty cell or cells were to be passed to either COUNT or COUNTIF, I get a TypeError as follows:

TypeError: unorderable types: int() >= NoneType()

In normalize_year(), month % 12 can be 0 -> IllegalMonthError

pycel/src/pycel/excelutil.py

Line 990 in 2c4d198

m = m % 12

Traceback (most recent call last):
File "\PyCel\venv\lib\site-packages\pycel\excellib.py", line 190, in date
year, month, day = normalize_year(year, month, day)
File "\PyCel\venv\lib\site-packages\pycel\excelutil.py", line 913, in normalize_year
days_in_month = get_max_days_in_month(m, y)
File "\PyCel\venv\lib\site-packages\pycel\excelutil.py", line 892, in get_max_days_in_month
return calendar.monthrange(year, month)[1]
File "\bin\python3.7\lib\calendar.py", line 124, in monthrange
raise IllegalMonthError(month)
calendar.IllegalMonthError: bad month number 0; must be 1-12

Compiling Excel spreadsheets to Python

Please have a look at:

(vallettea/koala#249)

Binder Notebook Example not Working

What actually happened

I tried to run the pycel example on binder but it gave an error ``NameError: name 'c' is not defined`. I think the example is missing some parts.

best,

Miguel

Support for iterative calculation

Microsoft Excel currently includes functionality to perform iterative calculations when formulas contain circular references. By default further processing of circular formulas is broken after 100 iterations or once the value of the target cell changes by less than 0.01.

Current evaluation of pycel on formula with circular references does not take number of iterations or minimum value change into account during its execution. The result is an infinite cyclical execution that only breaks once the maximum recursion depth is reached.

Would there be any way to add this functionality or otherwise provide direction to proper execution state locations so that this functionality can be added via PR?

Thanks for the incredible library!

Change of Licensing of the project is not possible

@thecapacity In a72203a you change the GPL-v3 licensing of this project to Public-Domain.

IANAL, but I believe you are not entitled to that, although your motives are kind.

My understanding is this:

You work under the US-gov, so all you code has to be in public-domain.
Your additions are indeed in public domain, and you are ok with adding an exception in the LICENSE file.
But on the derivative work, both licensing terms apply, but since GPLv3 is more "strict", it takes precedence.

Practically, the project has to remain under GPLv3 and not under the public domain.

re-license the parser/tokenizer under a license suitable for incorporation into openpyxl

Dirk, would you be OK with re-licensing the parsing and tokenizing code under a license which is compatible with openpyxl?

Incorrect evaluated formula result for unbounded column

What actually happened

The excel formula result of SUM()/COUNTIFS is incorrect for specific case:

=SUM(B15*Configuration!B4,B16*Configuration!B5,B17*Configuration!B6,B18*Configuration!B7,B19*Configuration!B8)
=COUNTIFS(Result!J:J,"Highest",Result!E:E,"fail")

formulas of B15-B19

What was expected to happen

The excel formula result should be -24 instead of 0

Problem description

Read excel through ExcelCompiler
Calculate formula through evaluate()

Code Sample

from pycel import ExcelCompiler
cp = ExcelCompiler(filename=PATH("data/TestData.xlsx"))
print("Report!B8: ", cp.evaluate('Report!B8'))
print("Report!B23: ", cp.evaluate('Report!B23'))
print("Report!B25: ", cp.evaluate('Report!B25'))

Environment

windows7 64bit
pycel:  1.0b16
python: 3.6.3

High RAM consumption when evaluating a formula

I have a 2.11MB of Excel file. When using pycel for evaluating the formula. The script's RAM consumption is 4GB+.
When I evaluate a ~400kb's Excel file the RAM consumption is just 200MB+
I would like to know how does evaluate work. Does it read through every single cell?
How can I prevent high RAM consumption when evaluating a formula?

empty cells seem to be translated as None

Hello,

I returned to using pycel recently and noticed that I get wrong results in this example:

excel formula:
=IF(A1<=3;0;1)
where A1 is an empty cell.

The above formula will return 0 in microsoft excel, as unacceptable as it may be. My guess is that excel treats empty cells as 0 and not as NULL (None in Python).

Pycel, like any sane program, will assign None value to A1 and the above formula will return 1.

Unfortunately, the excel version is the accepted behavior and I need to locate in pycel, where an empty cell is translated into a None value and turn that into 0.

Any pointers to where I should start looking?

Thanks for any suggestions,

Petros

Is there a way to get the list of dependent cells?

I am trying to see if there is an easy way to get a list of all dependencies for a particular cell. I understand I could view the graph, but I'd like to get the dependencies in a linear form (list, dictionary, tuple etc.) for further analysis.

Which function can I call in the ExcelCompiler class to obtain this for a particular cell?

I tried using the cell_map function but I don't believe it lays out ALL the dependencies for a given cell. From what I can tell, its simply displaying the formula in a given cell.

I see there is a needed_addresses function in the _Cell class, however I am not clear how to access that function from the ExcelCompiler instance.

Any help would be appreciated!

AVERAGEIFS

Dear dgorissen,

I'm a fan of your excellent work with Pycel! I am interested in contributing to the codebase, first by creating a version of the AVERAGEIFS function in Excel. Would you accept such a pull request? Where would you recommend getting started? Thanks!

Running example file

Hello,
New user of python. I am trying to run the example given with pycel.
I set up an environment with Python 2.7, install pycell, network, numpy, matplotlip dependencies.
In a Jupyter notebook, when I run:

    print "Compiling..., starting from D1"
    sp = c.gen_graph('D1',sheet='Sheet1')

I get the error:


Compiling..., starting from D1
Seed D1 expanded into 1 cells
1 filtered seeds 
1 cells on the todo list
Handling  Sheet1!D1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-bf6729dbb0ea> in <module>()
      1 print("Compiling..., starting from D1")
----> 2 sp = c.gen_graph('D1', sheet='Sheet1')

C:\[...]anaconda3\envs\pycel\lib\site-packages\pycel-0.0.1-py2.7.egg\pycel\excelcompiler.pyc in gen_graph(self, seed, sheet)
    639 
    640             # parse the formula into code
--> 641             pystr,ast = self.cell2code(c1)
    642 
    643             # set the code & compile it (will flag problems sooner rather than later)

C:\[...]\anaconda3\envs\pycel\lib\site-packages\pycel-0.0.1-py2.7.egg\pycel\excelcompiler.pyc in cell2code(self, cell)
    578         if cell.formula:
    579             e = shunting_yard(cell.formula or str(cell.value))
--> 580             ast,root = build_ast(e)
    581             code = root.emit(ast,context=Context(cell,self.excel))
    582         else:

C:\[...]anaconda3\envs\pycel\lib\site-packages\pycel-0.0.1-py2.7.egg\pycel\excelcompiler.pyc in build_ast(expression)
    540             #    G.add_edge(stack.pop(),n)
    541         else:
--> 542             G.add_node(n,{'pos':0})
    543 
    544         stack.append(n)

TypeError: add_node() takes exactly 2 arguments (3 given)

Sounds very basic, but I can't figure out what is wrong? Not correct version of networkx?

Pycel raises NotImplementedError on rectangular ranges

What actually happened

Pycel raises a exception on rectangular intervals if they are written as e.g., Sheet1!A1:Sheet1!A9 instead of Sheet1!A1:A9. Both are exactly the same interval.

What was expected to happen

Both intervals should nor raise exceptions,since they are the same.

Problem description

Pycel is interpreting intervals defined as e.g. Sheet1!A1:Sheet1!A9 as non rectangular. They are rectangular if the sheet reference is the same on both ends of the range.

Code Sample

I attached a test file Test Interval.xlsx to repeat the problem.

The code to reproduce the problem is

from pycel import ExcelCompiler

filename = 'Test Interval.xlsx'
model = ExcelCompiler(filename)

# This is a rectangular formula
model.evaluate('Sheet2!A1')

My traceback was

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-1-35ef567d212e> in <module>
      5 
      6 # This is a rectangular formula
----> 7 model.evaluate('Sheet2!A1')

c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in _evaluate_non_iterative(self, address)
    804 
    805             if address.address not in self.cell_map:
--> 806                 self._gen_graph(address)
    807 
    808         result = self._evaluate(str(address))

c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in _gen_graph(self, seed, recursed)
    873         if not recursed:
    874             # if not entered to process one cell / cellrange process other work
--> 875             self._process_gen_graph()
    876 
    877     def _process_gen_graph(self):

c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in _process_gen_graph(self)
    883             self.log.debug("Handling {}".format(dependant.address))
    884 
--> 885             for precedent_address in dependant.needed_addresses:
    886                 if precedent_address.address not in self.cell_map:
    887                     self._gen_graph(precedent_address, recursed=True)

c:\users\rodrigo\documents\github\pycel\src\pycel\excelcompiler.py in needed_addresses(self)
   1069     @property
   1070     def needed_addresses(self):
-> 1071         return self.formula and self.formula.needed_addresses or ()
   1072 
   1073 

c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in needed_addresses(self)
    612         if self._needed_addresses is None:
    613             # get all the cells/ranges this formula refers to, and remove dupes
--> 614             if self.python_code:
    615                 code = iter((self.python_code.encode(),))
    616                 tokens = tuple(tk.tokenize(lambda: next(code)))

c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in python_code(self)
    634                 self._python_code = ''
    635             else:
--> 636                 self._python_code = self.ast.emit
    637         return self._python_code
    638 

c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in emit(self)
    445             # map to the correct name
    446             return "{}({})".format(
--> 447                 self.func_map.get(func, func), self.comma_join_emit())
    448 
    449     @staticmethod

c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in comma_join_emit(self, fmt_str, to_emit)
    422             to_emit = self.children
    423         if fmt_str is None:
--> 424             return ", ".join(n.emit for n in to_emit)
    425         else:
    426             return ", ".join(

c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in <genexpr>(.0)
    422             to_emit = self.children
    423         if fmt_str is None:
--> 424             return ", ".join(n.emit for n in to_emit)
    425         else:
    426             return ", ".join(

c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in emit(self)
    344     @property
    345     def emit(self):
--> 346         return self._emit()
    347 
    348     def _emit(self, value=None):

c:\users\rodrigo\documents\github\pycel\src\pycel\excelformula.py in _emit(self, value)
    354         try:
    355             addr_str = value.replace('$', '')
--> 356             address = AddressRange.create(addr_str, sheet=sheet, cell=self.cell)
    357         except ValueError:
    358             # check for table relative address

c:\users\rodrigo\documents\github\pycel\src\pycel\excelutil.py in create(cls, address, sheet, cell)
    344             return address
    345 
--> 346         sheetname, addr = split_sheetname(address, sheet=sheet)
    347         addr_tuple, sheetname = range_boundaries(
    348             addr, sheet=sheetname, cell=cell)

c:\users\rodrigo\documents\github\pycel\src\pycel\excelutil.py in split_sheetname(address, sheet)
    547         sh, address_part = address.split('!', maxsplit=1)
    548         if '!' in address_part:
--> 549             raise NotImplementedError(
    550                 "Non-rectangular formulas: {}".format(address))
    551         sh = unquote_sheetname(sh)

NotImplementedError: Non-rectangular formulas: Sheet1!A1:'Sheet1'!A9

Environment

Windows 7
Python using Anaconda distribution
Pycel version 1.0b22
Python 3.8.3

`tokenize.TokenError: ('EOF in multi-line statement', (2, 0))` on a SUMIF with escaped double-quotes

What actually happened

I ran into tokenize.TokenError: ('EOF in multi-line statement', (2, 0)) when trying to evaluate cells in a large workbook. After some detective work I narrowed it down to a cell containing this formula:

=SUMIF(B12:B15, ">""", C12:C15)

The purpose of this formula is to only add up cells in the C-range whose corresponding cells in the B-range contain a non-whitespace string. Here is a minimal reproducible example spreadsheet:

Cell C1 contains the formula =SUMIF(A1:A5, ">""", B1:B5). Running this code:

from pycel import ExcelCompiler

fname = "test.xlsx"
excel = ExcelCompiler(filename=fname)

print(excel.evaluate("Sheet1!C1"))

I get the traceback:

$ "c:/Users/ckp95/Documents/pycel_test/.venv/Scripts/python.exe" "c:/Users/ckp95/Documents/pycel_test/foo.py"
Traceback (most recent call last):
  File "c:/Users/ckp95/Documents/pycel_test/foo.py", line 6, in <module>
    print(excel.evaluate("Sheet1!C1"))
  File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 808, in _evaluate_non_iterative
    self._gen_graph(address)
  File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 877, in _gen_graph
    self._process_gen_graph()
  File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 887, in _process_gen_graph
    for precedent_address in dependant.needed_addresses:
  File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelcompiler.py", line 1073, in needed_addresses
    return self.formula and self.formula.needed_addresses or ()
  File "C:\Users\ckp95\Documents\pycel_test\.venv\lib\site-packages\pycel\excelformula.py", line 616, in needed_addresses
    tokens = tuple(tk.tokenize(lambda: next(code)))
  File "C:\Users\ckp95\.pyenv\pyenv-win\versions\3.7.2\lib\tokenize.py", line 579, in _tokenize
    raise TokenError("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (2, 0))

I'm guessing this is because the parser doesn't know how to deal with escaped double-quotes.

What was expected to happen

It should print 6.

(Also it would be nice if the TokenError traceback told me which cell was the problem. For the original sheet I had to add print debug statements into the excelformula.py file to figure out which intermediate cell in the calculation chain was triggering it)

Environment

OS: Windows 10
Python version: 3.7.2
Pycel version: 1.0b22

About using unicode with pycel

Hi,

I am thinking of using pycel for a excel-to-python conversion model. Using python 2.7, in order to make sure everything is compatible, I ran into a few classic foreign character issues:

i.e.
ValueError 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

To make play nice, I have restricted my sheet names to be standard latin-character based and made the following changes to excelutil.py, to enable foreign characters to cells:

@@ -81,15 +81,15 @@ class Cell(object):

(-) self.__sheet = str(sheet)
(-) self.__formula = str(formula) if formula else None
(+) self.__sheet = sheet.encode('utf-8')
(+) self.__formula = formula.encode('utf-8') if formula else None

self.__sheet = sh
self.__col = c
self.__row = int(r)
self.__col_idx = col2num(c)

( -) self.value = str(value) if isinstance(value,unicode) else value
(+) self.value = value.encode('utf-8') if isinstance(value,unicode) else value
self.python_expression = None
self._compiled_expression = None

Now it seems like the gen_graph and evaluation work nicely, but creating a graph from the derived object still fails.

Thanks,
Petros

gen_graph cannot deal with a (complex) formula

Please note that Pycel deals with a file with hundreds of formula very well until:

I get to this cell called InputData!H50 the Excel is as follow:

=IFERROR(IF(year_modelStart>=year_baseCosts;2%;AVERAGE(L50:OFFSET(K50;0;MATCH(year_baseCosts;CA_Years;0))));2%)

New RangedNames feature deals with it properly and delivers to Python:

=IFERROR(IF('InputData'!$G$14>='InputData'!$G$15,2%,AVERAGE(L50:OFFSET(K50,0,MATCH('InputData'!$G$15,'InputData'!$L$5:$DG$5,0)))),2%)

So far so good but at some point I hit the wall in the emit function:

ss = args[0].emit(ast,context=context) + op + args[1].emit(ast,context=context)

IndexError, list out of range

With basically args[1].emit(ast,context=context) being out of range for some reason I cannot understand (it has passed dozens of if_error and it never called the def if_error function I created).

Not to be deterred, I add to my code the following try/except to by-pass eventual index issues (not a great idea when dealing with great code I'd suppose, but can one really resist looking over the hill?):

        try:
            ss = args[0].emit(ast,context=context) + op + args[1].emit(ast,context=context)
        except (IndexError):
            ss = args[0].emit(ast,context=context) + op

Without surprise it comes back to haunt me:

Exception: Failed to compile cell InputData!H50 with expression iferror((2% if eval_cell("InputData!G14")>=(eval_cell("InputData!G15") if eval_cell("InputData!G15") is not None else float('inf')) else average(l50:offset(eval_cell("InputData!K50"),0,match(eval_cell("InputData!G15"),eval_range("InputData!L5:DG5"),0)))),2%): invalid syntax (, line 1)

Any idea where to start?

PV function

I added support for the PV function (https://support.office.com/en-us/article/pv-function-23879d31-0e02-4321-be01-da16e8168cbd) to pycel and want to contribute the changes back to the repository.

MATCH and VLOOKUP issue

Facing some errors with MATCH and VLOOKUP in the attached sheet.

vlookup_temp.xlsx

If I set value for C11 to "Arena":

pycel's result for MATCH in C12 is 3, it should be 2 (MATCH running at 1 match type)
pycel's result for VLOOKUP in C13 is 3, it should be 2 (VLOOKUP running waicth TRUE approx match

Unbounded Range Addresses (ie: A:B or 1:2) broken

processing a xlsx file like this

a	f	a	f
b	g		
c	h		
d	i		
e	j

The 'f' on sheet1!D1=vlookup(C1,A:B,2,FALSE).
And got


  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b6-py3.6.egg\pycel\excelcompiler.py", line 557, in build_range
    if cell_address.address not in self.cell_map:

AttributeError: 'str' object has no attribute 'address'

It seems that it cannot run the str formula, but it do support something like A1&B1, (the & connector in Excel).
Is there any ways to support all the formula in Excel ?
If it's impossible then just add some description in the readme.

Dependency Issue (Graph.add_node() error)

Not sure if you can address this or if it's something you can just note in the readme but Networkx 2.1 (which is the default install by pip these days) is incompatible with how you add nodes to graph objects. In build_ast function you pass two arguments when adding a node to the graph (G.add_node(arg1,{'pos':1})) but this is no longer supported by Networkx 2.1 and throws errors. To get your package to work I just installed networkx 1.8 so I just wanted to let you know that (if you didn't know already and for others trying this awesome package out).

Is it still in working condition?

I'm not the strongest in programming so it might be my personal issue, but it does not seem to work. I'm trying to run example.py (in pycel/src folder) and I keep running into errors.
Mostly things like this:

File "C:\Python32\lib\site-packages\openpyxl-2.5.0a1-py3.2.egg\openpyxl\utils_init.py", line 5, in
from .cell import (
File "C:\Python32\lib\site-packages\openpyxl-2.5.0a1-py3.2.egg\openpyxl\utils\cell.py", line 9, in
from openpyxl.compat import str
ImportError: cannot import name str

Following the error to the _C:\Python32\Lib\site-packages\openpyxl-2.5.0a1-py3.2.egg\openpyxl\compat, it comes out there is no such file as str but "strings.py". And I keep running into similar errors.

Additionally, not as a problem, but as a question, can it only deal with .xlsx or could I convert .xlsm (macro enabled) formats with that as well?

Edit: Totally my own problem, I was using python3 and thought that compiling each file by using 2to3 would be enough. Naive me :D

Pandas based version

Any interest in creating a pandas based version?

COUNT not working for ranges

COUNT function doesn't work for ranges i.e = COUNT(A1:A5). It only works if the formula is written as =COUNT(A1,A2,A3,A4,A5).

Excel can compute ranges so not sure if this is intended behavior or not.

Find a proer version openpyxl ?Cannot import name 'TYPE_FORMULA'/No module named 'openpyxl.worksheet._reader'

Openpyxl ==2.5.3
pycel == 1.0b5
python==3.6.5
Run the example.py in pycel.
First, It says:

Traceback (most recent call last):

  File "<ipython-input-1-3e8fdd08e02b>", line 1, in <module>
    runfile('D:/pycel-master/pycel-master/example/example.py', wdir='D:/pycel-master/pycel-master/example')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/pycel-master/pycel-master/example/example.py", line 9, in <module>
    from pycel import ExcelCompiler

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\__init__.py", line 1, in <module>
    from .excelcompiler import ExcelCompiler

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 16, in <module>
    from pycel.excelwrapper import ExcelOpxWrapper

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelwrapper.py", line 15, in <module>
    from openpyxl.cell.cell import Cell, TYPE_FORMULA
ImportError: cannot import name 'TYPE_FORMULA'

Then I fixed it by declare:
TYPE_FORMULA = 'f' in excelwrapper.py
But got another error:

Loading D:/pycel-master/pycel-master/example\example.xlsx...
C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\worksheet.py:318: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)
Traceback (most recent call last):

  File "<ipython-input-2-3e8fdd08e02b>", line 1, in <module>
    runfile('D:/pycel-master/pycel-master/example/example.py', wdir='D:/pycel-master/pycel-master/example')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/pycel-master/pycel-master/example/example.py", line 34, in <module>
    print("D1 is %s" % excel.evaluate('Sheet1!D1'))

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 652, in evaluate
    self._gen_graph(address)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 681, in _gen_graph
    self._make_cells(seed)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 574, in _make_cells
    new_cells = [build_cell(address)]

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelcompiler.py", line 530, in build_cell
    excel_range = self.excel.get_range(addr)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pycel-1.0b5-py3.6.egg\pycel\excelwrapper.py", line 178, in get_range
    self.from_excel):

  File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1227, in __enter__
    self.target = self.getter()

  File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1397, in <lambda>
    getter = lambda: _importer(target)

  File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1084, in _importer
    thing = _dot_lookup(thing, comp, import_path)

  File "C:\ProgramData\Anaconda3\lib\unittest\mock.py", line 1073, in _dot_lookup
    __import__(import_path)

ModuleNotFoundError: No module named 'openpyxl.worksheet._reader'

Which I don't know how to fix.
It seems to be caused by the dismatch between openpyx and pycel.
How to find a proper one? Or fix the remain problems?

win32.com cannot acquire all Named Ranges

I have separated this issue from previous on Named Ranges owing to the need of structural changes to Pycell prior to overcome this.

Some named ranges specified in Office Open XML cannot be read from win32.com

E.g. take the name 'test' which formula is '$A'

This means that it refers to the cell in column $A on the same line where the cell in which this name is used. E.g. if 'test' representing '$A' is used in B4 then $A means $A4, if 'test' representing '$A' is used in G5 then 'test' means '$A5', if 'test' representing '$A' is used in ZZ4 and ZZ5 then '$A' means '$A4' and '$A5' respectively.

When read from the NameManager in Excel, it will appear as $A + the line in which the cursor was when the file was last saved and not $A. Say the cursor was on AB18 then it will appear as $A18. Using win32.com it will not even appear at all in the list of named ranges.

This implies that structurally PyCell must evolve from accessing the COM of Excel to reading the underlying XML. There is a program called openpyxl that just does that. I saw from the log of discussions on Pycell that a merger of the programs was being discussed. Did this proceed?

	self.dep_graph.node[node]['sheet'] = node.sheet
	self.dep_graph.node[node]['label'] = node.address.coordinate

dgorissen / pycel Goto Github PK

pycel's People

Contributors

Stargazers

Watchers

Forkers

pycel's Issues

What actually happened

What was expected to happen

Environment

Problem description

What actually happened

What was expected to happen

Code Sample

Environment

What actually happened

What was expected to happen

Problem description

What actually happened

What actually happened

What was expected to happen

Problem description

Code Sample

Environment

What actually happened

What was expected to happen

Problem description

Code Sample

Environment

What actually happened

What was expected to happen

Environment

Recommend Projects

Recommend Topics

Recommend Org