Evaluating Excel Formulas with Apache POI

Apache POI is a full-featured Java library for reading several different Microsoft file formats, including Excel. (Note: this post only covers POI version 3.0.2-FINAL released in February 2008).

POI supports evaluating cells with formulas containing Excel functions. For example, consider the code below for reading a java.util.Date from a simple spreadsheet with the formula “=NOW()” in cell A1 which outputs Mon Apr 28 22:48:57 CDT 2008:

 1 HSSFWorkbook workbook = new HSSFWorkbook(inputStream);
 2 HSSFSheet sheet = workbook.getSheet("Sheet1");
 3 HSSFRow row = sheet.getRow(0); // 0-based.
 4 HSSFCell cell = row.getCell((short) 0); // Also 0-based;
 5 System.out.println(cell.getDateCellValue()); 
 

Code Listing 1: Reads cached formula result.

This works great! But on closer examination, you’ll notice that the value returned is the saved result of the last formula evaluation in the spreadsheet, meaning that it will always return the same (in this case) Mon Apr 28... value. What should you do if you want a dynamic value from the formula, e.g. when the program is run tomorrow?

The answer is to use POI’s HSSFFormulaEvaluator. The API evaluateFormulaCell() does the work of dynamically evaluating a cell formula and saving the result in the cell record. The code below shows how this works, and sleeps for one second in order to prove that the date returned in the example is in fact not pre-calculated:

 1 HSSFFormulaEvaluator formulaEvaluator = new HSSFFormulaEvaluator(sheet, workbook);
 2 formulaEvaluator.setCurrentRow(row);
 3 for (int i = 0; i < 3; ++i) {
 4     formulaEvaluator.evaluateFormulaCell(cell);
 5     System.out.println(cell.getDateCellValue());
 6     Thread.sleep(1000);
 7 }
 

Code Listing 2: On-the-fly formula evaluation.

So we’re done, right? Not yet, unfortunately. If you run the code above, you’ll encounter this exception:

Exception in thread "main" java.lang.RuntimeException: Cannot Parse, sorry : Found reference to named range "SEMIVOLATILE", but that named range wasn't defined! @ 22 [Formula String was: 'NOW(ATTR(semiVolatile))']
at org.apache.poi.hssf.model.FormulaParser.Abort(FormulaParser.java:129)
at org.apache.poi.hssf.model.FormulaParser.Ident(FormulaParser.java:335)
at org.apache.poi.hssf.model.FormulaParser.Factor(FormulaParser.java:522)
at org.apache.poi.hssf.model.FormulaParser.Term(FormulaParser.java:660)
at org.apache.poi.hssf.model.FormulaParser.Expression(FormulaParser.java:708)
at org.apache.poi.hssf.model.FormulaParser.Arguments(FormulaParser.java:486)
at org.apache.poi.hssf.model.FormulaParser.function(FormulaParser.java:358)
at org.apache.poi.hssf.model.FormulaParser.Ident(FormulaParser.java:280)
at org.apache.poi.hssf.model.FormulaParser.Factor(FormulaParser.java:522)
at org.apache.poi.hssf.model.FormulaParser.Term(FormulaParser.java:660)
at org.apache.poi.hssf.model.FormulaParser.Expression(FormulaParser.java:708)
at org.apache.poi.hssf.model.FormulaParser.Arguments(FormulaParser.java:486)
at org.apache.poi.hssf.model.FormulaParser.function(FormulaParser.java:358)
at org.apache.poi.hssf.model.FormulaParser.Ident(FormulaParser.java:280)
at org.apache.poi.hssf.model.FormulaParser.Factor(FormulaParser.java:522)
at org.apache.poi.hssf.model.FormulaParser.Term(FormulaParser.java:660)
at org.apache.poi.hssf.model.FormulaParser.Expression(FormulaParser.java:708)
at org.apache.poi.hssf.model.FormulaParser.parse(FormulaParser.java:812)
at org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.internalEvaluate(HSSFFormulaEvaluator.java:361)
at org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluateFormulaCell(HSSFFormulaEvaluator.java:241)
at net.chipkillmar.poi_excel_functions.Example.main(Example.java:23)

Output Listing 1: Exception thrown executing Code Listing 2.

This is a cryptic way of saying that the NOW() function isn’t implemented in POI. It turns out that it’s fairly easy to understand why this exception happens and to fix it, though.

Apache has some good online documentation for developers here about formula evaluation. In a nutshell, formula strings are parsed into Reverse Polish Notation tokens. Each RPN token for an Excel function is mapped to a class in POI, for example the token for the NOW() function corresponds to the class org.apache.poi.hssf.record.formula.functions.Now.

The other key point is that all Excel formula function classes implement the interface org.apache.poi.hssf.record.formula.functions.Function. This interface has a single method, Eval evaluate(Eval[] evals, int srcCellRow, short srcCellCol), where all the heavy lifting happens.

The existing Now function class implementation in POI clearly reveals that an actual implementation will be needed for us to use it:

 1 package org.apache.poi.hssf.record.formula.functions;
 2
 3 public class Now extends NotImplementedFunction {
 4
 5 }
 

Code Listing 3: The existing Now class.

Writing an implementation to replace the original code is a cinch, as shown by the following:

 1 package org.apache.poi.hssf.record.formula.functions;
 2
 3 import org.apache.poi.hssf.record.formula.eval.Eval;
 4 import org.apache.poi.hssf.record.formula.eval.NumberEval;
 5 import org.apache.poi.hssf.usermodel.HSSFDateUtil;
 6
 7 /**
 8  * Excel NOW() function implementation for POI.
 9  *
10  * @author Chip Killmar
11  */
12 public class Now extends NumericFunction {
13     public Eval evaluate(Eval[] evals, int srcCellRow, short srcCellCol) {
14         return new NumberEval(HSSFDateUtil.getExcelDate(new java.util.Date()));
15     }
16 }
 

Code Listing 4: The new Now class.

In theory we should be done at this point if it isn’t for what appears to be a small bug in org.apache.poi.hssf.record.formula.AbstractFunctionPtg that we’ll need to fix. The bug is that the metadata used to describe NOW() has the function accepting one parameter, when it should take none:

548 functionData[74][0]=new Byte(Ptg.CLASS_VALUE);functionData[74][1]=new byte[] {Ptg.CLASS_REF};functionData[74][2]=new Integer(1);
 

Code Listing 5: Metadata bug for NOW() has it taking one parameter.

549 // Chip Killmar
550 // The NOW() function takes no parameters, so change the third array element here to 0.
551 functionData[74][0]=new Byte(Ptg.CLASS_VALUE);functionData[74][1]=new byte[] {Ptg.CLASS_REF};functionData[74][2]=new Integer(0);
 

Code Listing 6: Fixed so that NOW() correctly takes zero parameters.

That’s it! Making sure our classes have precedence in the Java classpath, you’ll now see the following output:

Mon Apr 28 22:50:07 CDT 2008
Mon Apr 28 22:50:08 CDT 2008
Mon Apr 28 22:50:09 CDT 2008

Output Listing 2: Output from Code Listing 2 after our changes are in place.

In summary, it’s possible to achieve dynamic formula evaluation of Excel functions with POI. However, as the POI documentation states up front, and we’ve seen here, some functions haven’t been implemented yet. Fortunately, POI has a good design for formula function implementations in place, so it’s not terribly hard to do it yourself. And with some luck, future releases of POI will incorporate function implementations from the developer community and perhaps obtain full coverage of the Excel function set.

[Source code for this post can be downloaded here.]