Archive

Archive for February, 2010

A User Friendly Experience

February 7th, 2010 dan No comments

Let’s be honest for a second, when was the last time you saw a Windows user running something from the command prompt? Well, I do it occasionally, but I can’t say I remember seeing a non-IT person using the command prompt recently. So if you’re going to offer your users a Windows program, you better give them an icon to click and let them drag stuff onto it. And if something goes wrong, you better have a decent error message. This post will take the Pivot Table generation script developed in the last post and turn it into a user friendly Windows program with better flexibility and improved user experience.

The scripts developed previously could be run at the command line or by double clicking on the icon for the script line this.

20100207_commandexe

20100207_erpicon

This works because the input file name, ABCDCatering.xls, is hard coded within the script. In the real world, your users will have folders containing dozens of randomly named spreadsheets. If a user accidentally provides a corrupt spreadsheet, the program should keep cranking through the other files and let the user recover the damaged file later. The script developed in the last post will need some enhancements to make it more user friendly, including:

  • Provide support for multiple randomly named input spreadsheets
  • Add some simple message boxes and drag-and-drop support
  • Improve the error checking and error recovery to give the user feedback when something goes wrong

To keep things concise, this version of the script only allows the user to run the program by dragging and dropping files onto the program icon. Enhancing the script to also support command line operation is left as an exercise for the user. Let’s work through each of the usability issues below:

Multiple File Support
As I mentioned, Windows XP/Vista/7 users typically don’t interact with the command prompt. Instead, programs are run by clicking on their icons, either from the desktop, a folder, or the Start menu. A user specifies spreadsheets or document files by opening them in the application or dragging them onto the program icon on the desktop or in the Explorer window. You can also add the file names after the program name at the command prompt if needed.

To process multiple files, the program needs to process command line args, which are already conveniently available in the sys.argv list. Note that the first argument sys.argv[0] is used for the script name. The runexcel function is modified to pass sys.argv to the runexcel function, which loops through each of the input files.

if __name__ == "__main__":
    runexcel(sys.argv)
for fname in args[1:]:
    # Process spreadsheet files

The for loop wraps the wb = excel.Workbooks.Open(fname) call, the wb.SaveAs() call, and everything in between so each workbook is processed within the loop. After the loop finishes, a check for errors is made. If any errors occurred a warning and message box are issued.

Primitive GUI Support
Adding message boxes and providing basic drag-and-drop support adds a level of familiarity for Windows users. Python supports a large number of GUI frameworks, see http://wiki.python.org/moin/GuiProgramming for a comprehensive list. Building a complete graphic interface for this script is beyond the scope of this article, and isn’t really necessary anyway. Instead, you can add support for simple message boxes using the MessageBoxA function built into Windows. The basic pattern for calling a message box using this technique is to import ctypes and call windll.user32.MessageBoxA:

from ctypes import *
windll.user32.MessageBoxA(None,"My Message Box","Program Name",0)

This simple code produces a message box with the text “My Message Box”, an OK button, and “Program Name” as the top banner. When Python encounters windll.user32.MessageBoxA(), program execution pauses until the user clicks the OK button.

20100207_messagebox

Improve Error Checking
Lots of problems can happen when reading user spreadsheet data. The user can forget to specify an input file. They could try to have the script read a Word document or other non-spreadsheet file type. The spreadsheet might be corrupted. You need to bulletproof your script and guard against potential issues, both known and unknown.

Previous versions of the script made limited use of the try/except pattern to catch errors.

try:
    wb = excel.Workbooks.Open('ABCDCatering.xls')
except:
    print "Failed to open spreadsheet ABCDCatering.xls"
    sys.exit(1)

erppivotdragdrop.py makes more liberal use of try/except, wrapping more of the program code in the try block. If an error occurs, it can be handled more cleanly with nice warning messages. The downside of using try/except is that you lose the traceback message telling you where the error occurred. To get this information back, use the traceback module and the traceback.print_exc() function. One usage is to call traceback.print_exc() in the except block like this:

import traceback
try:
  a = 1/0
except:
  # Do error recovery
  traceback.print_exc()

Now exceptions are caught, handled, and a more detailed traceback is still available.

Running the script
Let’s test out the script. First, copy the script to the desktop and drag the ABCDCatering.xls spreadsheet onto the icon. Python starts running in the command window and begins processing the file you dragged. If everything ran successfully, you’ll see a series of messages and the “Finished” message box.

20100207_noerror

If a problem occurred, a message is displayed in the command window. At the end of the run, the message box is displayed letting you know that something bad happened and that you should review the error messages.

20100207_haserror

Here is the completed script, also available on GitHub

#
# erppivotdragdrop.py:
# Load raw EPR data, clean up header info,
# insert additional data fields and build 5 pivot tables
# Support drag and drop of multiple spreadsheets
#
import win32com.client as win32
win32c = win32.constants
import sys
import itertools
import re
import traceback
from ctypes import *

tablecount = itertools.count(1)

def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=""):
    """Build a pivot table using the provided source location data
    and specified fields
    """
    newsheet = wb.Sheets.Add()
    newsheet.Cells(1,1).Value = title
    newsheet.Cells(1,1).Font.Size = 16

    # Build the Pivot Table
    tname = "PivotTable%d"%tablecount.next()

    pc = wb.PivotCaches().Add(SourceType=win32c.xlDatabase,
                                 SourceData=sourcedata)
    pt = pc.CreatePivotTable(TableDestination="%s!R4C1"%newsheet.Name,
                             TableName=tname,
                             DefaultVersion=win32c.xlPivotTableVersion10)
    wb.Sheets(newsheet.Name).Select()
    wb.Sheets(newsheet.Name).Cells(3,1).Select()
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1

    wb.ActiveSheet.PivotTables(tname).AddDataField(
        wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
        sumvalue,
        win32c.xlSum)
    if len(sortfield) != 0:
        wb.ActiveSheet.PivotTables(tname).PivotFields(sortfield[0]).AutoSort(sortfield[1], sumvalue)
    newsheet.Name = title

    # Uncomment the next command to limit output file size, but make sure
    # to click Refresh Data on the PivotTable toolbar to update the table
    #
    # newsheet.PivotTables(tname).SaveData = False

    return tname

def runexcel(args):
    """Open the spreadsheet ABCDCatering.xls, clean it up,
    and add pivot tables
    """
    sawerror = False
    print "Running erppivotdragdrop"
    if len(args) == 1:
        windll.user32.MessageBoxA(None,"Error: Please drag at least one Excel file","erppivotdragdrop",0)
        sys.exit(1)
    try:
        excel = win32.gencache.EnsureDispatch('Excel.Application')
        for fname in args[1:]:
            if not re.search(r'\.(?i)xlsx?$',fname):
                print "Error: File %s doesn't seem to be an Excel file, expecting .xls or .xlsx file" % fname
                sawerror = True
                continue
            if not re.match('[A-Za-z]:',fname):
                print "Error: erppivotdragdrop doesn't support command line execution"
                print "       Please drag and drop the Excel file onto the program icon"
                sawerror = True
                continue
            print "Processing %s" % fname
            try:
                wb = excel.Workbooks.Open(fname)
            except:
                print "Failed to open Excel file %s, skipping" % fname
                sawerror = True
                continue

            try:
                ws = wb.Sheets('Sheet1')
            except:
                print "Failed to open Sheet 'Sheet1' in file %s, skipping" % fname
                wb.Close()
                sawerror = True
                continue

            xldata = ws.UsedRange.Value
            newdata = []
            for row in xldata:
                if len(row) == 13 and row[-1] is not None:
                    newdata.append(list(row))
            lasthdr = "Col A"
            for i,field in enumerate(newdata[0]):
                if field is None:
                    newdata[0][i] = lasthdr + " Name"
                else:
                    lasthdr = newdata[0][i]

            logolookup = {'Applied Materials':'AMAT', 'Electronic Arts':'EA',
                          'Hewlett-Packard':'HP', 'KLA-Tencor':'KLA'}
            if ("Company Name" in newdata[0]):
                cindx = newdata[0].index("Company Name")
                newdata[0][cindx+1:cindx+1] = ["Logo Name"]
                for rcnt in range(1,len(newdata)):
                    if newdata[rcnt][cindx] in logolookup:
                        newdata[rcnt][cindx+1:cindx+1] = [logolookup[newdata[rcnt][cindx]]]
                    else:
                        newname = newdata[rcnt][cindx].split()[0]
                        newdata[rcnt][cindx+1:cindx+1] = [newname]
                        logolookup[newdata[rcnt][cindx]] = newname

            foodlookup = {'Caesar Salad':'Salad', 'Cheese Pizza':'Pizza',
                          'Cheeseburger':'Burger', 'Chocolate Sundae':'Dessert',
                          'Churro':'Snack', 'Hamburger':'Burger', 'Hot Dog':'HotDog',
                          'Pepperoni Pizza':'Pizza', 'Potato Chips':'Snack',
                          'Soda':'Drink'}
            if ("Food Name" in newdata[0]):
                cindx = newdata[0].index("Food Name")
                newdata[0][cindx+1:cindx+1] = ["Food Category"]
                for rcnt in range(1,len(newdata)):
                    if newdata[rcnt][cindx] in foodlookup:
                        newdata[rcnt][cindx+1:cindx+1] = [foodlookup[newdata[rcnt][cindx]]]
                    else:
                        newdata[rcnt][cindx+1:cindx+1] = ['UNDEFINED']

            rowcnt = len(newdata)
            colcnt = len(newdata[0])
            wsnew = wb.Sheets.Add()
            wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(rowcnt,colcnt)).Value = newdata
            wsnew.Columns.AutoFit()

            src = "%s!R1C1:R%dC%d"%(wsnew.Name,rowcnt,colcnt)

            # What were the total sales in each of the last four quarters?
            addpivot(wb,src,
                     title="Sales by Quarter",
                     filters=(),
                     columns=(),
                     rows=("Fiscal Quarter",),
                     sumvalue="Sum of Net Booking",
                     sortfield=())

            # What are the sales for each food item in each quarter?
            addpivot(wb,src,
                     title="Sales by Food Item",
                     filters=(),
                     columns=("Food Name",),
                     rows=("Fiscal Quarter",),
                     sumvalue="Sum of Net Booking",
                     sortfield=())

            # Who were the top 10 customers for ABCD Catering in 2009?
            addpivot(wb,src,
                     title="Top 10 Customers",
                     filters=(),
                     columns=(),
                     rows=("Company Name",),
                     sumvalue="Sum of Net Booking",
                     sortfield=("Company Name",win32c.xlDescending))

            # Who was the highest producing sales rep for the year?
            addpivot(wb,src,
                     title="Top Sales Reps",
                     filters=(),
                     columns=(),
                     rows=("Sales Rep Name","Company Name"),
                     sumvalue="Sum of Net Booking",
                     sortfield=("Sales Rep Name",win32c.xlDescending))

            # What food item had the highest unit sales in Q4?
            ptname = addpivot(wb,src,
                     title="Unit Sales by Food",
                     filters=("Fiscal Quarter",),
                     columns=(),
                     rows=("Food Name",),
                     sumvalue="Sum of Quantity",
                     sortfield=("Food Name",win32c.xlDescending))
            wb.Sheets("Unit Sales by Food").PivotTables(ptname).PivotFields("Fiscal Quarter").CurrentPage = "2009-Q4"

            # What food category had the highest unit sales in Q4?
            ptname = addpivot(wb,src,
                     title="Unit Sales by Food Category",
                     filters=("Fiscal Quarter",),
                     columns=(),
                     rows=("Food Category",),
                     sumvalue="Sum of Quantity",
                     sortfield=("Food Category",win32c.xlDescending))
            wb.Sheets("Unit Sales by Food Category").PivotTables(ptname).PivotFields("Fiscal Quarter").CurrentPage = "2009-Q4"

            outfname = re.sub('(?i)\.xlsx?','_new',fname)
            try:
                if int(float(excel.Version)) >= 12:
                    wb.SaveAs(outfname+'.xlsx',win32c.xlOpenXMLWorkbook)
                    print "Wrote %s" % outfname+'.xlsx'
                else:
                    wb.SaveAs(outfname+'.xls')
                    print "Wrote %s" % outfname+'.xls'
            except:
                print "Error: Problem during file save"
                sawerror = True
            wb.Close()
        if sawerror:
            print "Errors occurred, please check the above messages"
            windll.user32.MessageBoxA(None,"Error: Problems occurred, please check them and try again","erppivotdragdrop",0)
        else:
            print "Finished"
            windll.user32.MessageBoxA(None,"Finished","erppivotdragdrop",0)
    except:
        traceback.print_exc()
        print "Errors occurred, please check the above messages"
        windll.user32.MessageBoxA(None,"Error: Problems occurred, please check them and try again","erppivotdragdrop",0)
    excel.Application.Quit()

if __name__ == "__main__":
    runexcel(sys.argv)

Prerequisites
Python (refer to http://www.python.org)

Win32 Python module (refer to http://sourceforge.net/projects/pywin32)

Microsoft Excel (refer to http://office.microsoft.com/excel)

Source Files and Scripts
Source for the program erppivotextended.py and spreadsheet file ABCDCatering.xls are available at
http://github.com/pythonexcels/examples

Thanks — Dan

Categories: Uncategorized Tags: