<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Python Excels &#187; Excel</title>
	<atom:link href="http://www.pythonexcels.com/category/excel/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pythonexcels.com</link>
	<description>Data Mining with Excel and Python</description>
	<lastBuildDate>Mon, 08 Feb 2010 03:56:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Extending Pivot Table Data</title>
		<link>http://www.pythonexcels.com/2009/12/extending-pivot-table-data/</link>
		<comments>http://www.pythonexcels.com/2009/12/extending-pivot-table-data/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 19:20:17 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[ERP]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.pythonexcels.com/?p=248</guid>
		<description><![CDATA[As shown in the last post, automating pivot table generation with Python and Excel helps you quickly clean up a spreadsheet, organize data and build useful reports in very few lines of code.  Another useful data preparation technique is to build new columns of information based on the available data.  For example, you [...]]]></description>
			<content:encoded><![CDATA[<p>As shown in the <a href="http://www.pythonexcels.com/2009/11/automating-pivot-tables-with-python/">last post</a>, automating pivot table generation with Python and Excel helps you quickly clean up a spreadsheet, organize data and build useful reports in very few lines of code.  Another useful data preparation technique is to build new columns of information based on the available data.  For example, you could add an industry segment column to group company names by industry, or add an item type column to group sales items by category.  While Excel does have some functions to help with adding new data fields, automation with Python eliminates the tedium of clicking column names and entering formulas.</p>
<p>Excel does provide a function for calculating new values within a pivot table.  One example is extending a pivot table containing pricing and quantity data to compute an average selling price.  For example, given the table below:</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/12/20091203_salesbyqtr.png" alt="20091203_salesbyqtr" title="20091203_salesbyqtr" width="392" height="323" class="alignnone size-full wp-image-249" /></p>
<p>a new label called &#8220;ASP&#8221;, which is the Net Booking divided by the Quantity, can be added quickly and easily with Excel&#8217;s Calculated Field capability.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/12/20091203_calcfield.png" alt="20091203_calcfield" title="20091203_calcfield" width="549" height="612" class="alignnone size-full wp-image-250" /></p>
<p>This feature is handy for adding labels on the fly that require a simple calculation.  </p>
<p>In other cases, deriving the new field may not be so simple, yet needs to be performed each time the spreadsheet is updated.  Python can programmatically add new data fields to the source table so that the data is ready for viewing whenever the pivot table is opened.</p>
<p>The script developed last time automated the data cleanup and pivot table generation tasks.  Doing some further analysis based on the output spreadsheet, I created a chart of the Top 10 Customers for ABCD Catering: </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/12/20091203_top10chart.png" alt="20091203_top10chart" title="20091203_top10chart" width="550" height="353" class="alignnone size-full wp-image-251" /></p>
<p>Note that some of the company names are 15 characters or longer in length and occupy much of the chart space.  It would be nice to have a shorter &#8220;nickname&#8221; for each company that could be used in the charts.  One solution is to cut and paste the pivot table data, then modify the Company Name information by hand.  Unfortunately, this would be very tedious.  Another approach is to automate the process in the script and create a new column derived from a comprehensive reference table of company names and nicknames.  The downside is that maintaining the list could be an issue as the business grows and the list of customers grows longer.  A third method is to create an algorithm that uses the first word in the company name wherever possible, and uses a defined nickname for other special cases.  &#8220;Sun Microsystems&#8221; becomes &#8220;Sun&#8221; and &#8220;Cisco Systems&#8221; becomes &#8220;Cisco&#8221;, while other company names such as &#8220;Hewlett-Packard&#8221; could be listed in a lookup with a nickname such as &#8220;HP&#8221;.  The snippet below shows how this is done.</p>
<pre class="brush: python;">
logolookup = {'Applied Materials':'AMAT', 'Electronic Arts':'EA',
              'Hewlett-Packard':'HP', 'KLA-Tencor':'KLA'}
if (&quot;Company Name&quot; in newdata[0]):
    cindx = newdata[0].index(&quot;Company Name&quot;)
    newdata[0][cindx+1:cindx+1] = [&quot;Logo Name&quot;]
    for rcnt in range(1,len(newdata)):
        if newdata[rcnt][cindx] in logolookup:
            newdata[rcnt][cindx+1:cindx+1] = [logolookup[newdata[rcnt][cindx]]]
        else:
            newname = newdata[rcnt][cindx].split()[0]
            newdata[rcnt][cindx+1:cindx+1] = [newname]
            logolookup[newdata[rcnt][cindx]] = newname
</pre>
<p>This code begins with  a simple lookup for company names and can be easily extended as special case company names are added.  Next, the column location of the &#8220;Company Name&#8221; field is identified and the new header &#8220;Logo Name&#8221; is inserted after &#8220;Company Name&#8221; in the list using the <code>list[index:index]</code> construct.  The <code>for</code> loop iterates over each row in the table, checking whether the company name for that row exists in the <code>logolookup</code> dictionary, then inserting the abbreviated name.  If not found, then the original company name is <code>split()</code> into words and the first word used as the new abbreviated name.  Finally, the <code>logolookup</code> dictionary is updated with the new abbreviated name.  </p>
<p>After running the program, the new column &#8220;Logo Name&#8221; has been inserted after &#8220;Company Name&#8221; and contains the shortened company names.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/12/20091203_withlogo.png" alt="20091203_withlogo" title="20091203_withlogo" width="431" height="273" class="alignnone size-full wp-image-252" /></p>
<p>The new &#8220;Logo Name&#8221; column can be used in the previous pivot table and chart, replacing the &#8220;Company Name&#8221; field and producing a cleaner chart with less area used for displaying company name information.<br />
<img src="http://www.pythonexcels.com/wp-content/uploads/2009/12/20091203_top10wlogo.png" alt="20091203_top10wlogo" title="20091203_top10wlogo" width="550" height="354" class="alignnone size-full wp-image-253" /> </p>
<p>Another use of this technique is to add a label for &#8220;Food Category&#8221; based on the type of food purchased.  For example, the food items sold by ABCD Catering are: Caesar Salad, Cheese Pizza, Cheeseburger, Chocolate Sundae, Churro, Hamburger, Hot Dog, Pepperoni Pizza, Potato Chips and Soda.  Let&#8217;s say that your manager wants to track the sales of different food categories, such as Burger, Dessert, HotDog, Drink, Pizza, Salad and Snack.  Using the same technique outlined above, this code will add a column for Food Category with the appropriate entry for each food item: </p>
<pre class="brush: python;">
foodlookup = {'Caesar Salad':'Salad', 'Cheese Pizza':'Pizza',
              'Cheeseburger':'Burger', 'Chocolate Sundae':'Dessert',
              'Churro':'Snack', 'Hamburger':'Burger', 'Hot Dog':'HotDog',
              'Pepperoni Pizza':'Pizza', 'Potato Chips':'Snack',
              'Soda':'Drink'}
if (&quot;Food Name&quot; in newdata[0]):
    cindx = newdata[0].index(&quot;Food Name&quot;)
    newdata[0][cindx+1:cindx+1] = [&quot;Food Category&quot;]
    for rcnt in range(1,len(newdata)):
        if newdata[rcnt][cindx] in foodlookup:
            newdata[rcnt][cindx+1:cindx+1] = [foodlookup[newdata[rcnt][cindx]]]
        else:
            newdata[rcnt][cindx+1:cindx+1] = ['UNDEFINED']
</pre>
<p>If a food item is not found in the lookup, the category is labeled UNDEFINED.  This is an indication that there is a problem with the script and the lookup for food categories needs to be extended. </p>
<p>The section of the script which creates the pivot tables can be easily extended to build a new table based on the newly created label &#8220;Food Category&#8221;:</p>
<pre class="brush: python;">
# What food category had the highest unit sales in Q4?
ptname = addpivot(wb,src,
         title=&quot;Unit Sales by Food Category&quot;,
         filters=(&quot;Fiscal Quarter&quot;,),
         columns=(),
         rows=(&quot;Food Category&quot;,),
         sumvalue=&quot;Sum of Quantity&quot;,
         sortfield=(&quot;Food Category&quot;,win32c.xlDescending))
wb.Sheets(&quot;Unit Sales by Food Category&quot;).PivotTables(ptname).PivotFields(&quot;Fiscal Quarter&quot;).CurrentPage = &quot;2009-Q4&quot;
</pre>
<p>Based on the output spreadsheet, the best selling food category in Q4 based on quantity is &#8220;Snack&#8221;, with sales of 13700 units.  </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/12/20091203_foodcategory.png" alt="20091203_foodcategory" title="20091203_foodcategory" width="333" height="381" class="alignnone size-full wp-image-254" /></p>
<p>Here is the completed script, also available on <a href="http://github.com/pythonexcels/examples">GitHub</a></p>
<pre class="brush: python;">
#
# erppivotextended.py:
# Load raw EPR data, clean up header info,
# insert additional data fields and build 5 pivot tables
#
import win32com.client as win32
win32c = win32.constants
import sys
import itertools
tablecount = itertools.count(1)

def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=&quot;&quot;):
    &quot;&quot;&quot;Build a pivot table using the provided source location data
    and specified fields
    &quot;&quot;&quot;
    newsheet = wb.Sheets.Add()
    newsheet.Cells(1,1).Value = title
    newsheet.Cells(1,1).Font.Size = 16

    # Build the Pivot Table
    tname = &quot;PivotTable%d&quot;%tablecount.next()

    pc = wb.PivotCaches().Add(SourceType=win32c.xlDatabase,
                                 SourceData=sourcedata)
    pt = pc.CreatePivotTable(TableDestination=&quot;%s!R4C1&quot;%newsheet.Name,
                             TableName=tname,
                             DefaultVersion=win32c.xlPivotTableVersion10)
    wb.Sheets(newsheet.Name).Select()
    wb.Sheets(newsheet.Name).Cells(3,1).Select()
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1

    wb.ActiveSheet.PivotTables(tname).AddDataField(
        wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
        sumvalue,
        win32c.xlSum)
    if len(sortfield) != 0:
        wb.ActiveSheet.PivotTables(tname).PivotFields(sortfield[0]).AutoSort(sortfield[1], sumvalue)
    newsheet.Name = title

    # Uncomment the next command to limit output file size, but make sure
    # to click Refresh Data on the PivotTable toolbar to update the table
    # newsheet.PivotTables(tname).SaveData = False

    return tname

def runexcel():
    &quot;&quot;&quot;Open the spreadsheet ABCDCatering.xls, clean it up,
    and add pivot tables
    &quot;&quot;&quot;
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    #excel.Visible = True
    try:
        wb = excel.Workbooks.Open('ABCDCatering.xls')
    except:
        print &quot;Failed to open spreadsheet ABCDCatering.xls&quot;
        sys.exit(1)
    ws = wb.Sheets('Sheet1')
    xldata = ws.UsedRange.Value
    newdata = []
    for row in xldata:
        if len(row) == 13 and row[-1] is not None:
            newdata.append(list(row))
    lasthdr = &quot;Col A&quot;
    for i,field in enumerate(newdata[0]):
        if field is None:
            newdata[0][i] = lasthdr + &quot; Name&quot;
        else:
            lasthdr = newdata[0][i]

    logolookup = {'Applied Materials':'AMAT', 'Electronic Arts':'EA',
                  'Hewlett-Packard':'HP', 'KLA-Tencor':'KLA'}
    if (&quot;Company Name&quot; in newdata[0]):
        cindx = newdata[0].index(&quot;Company Name&quot;)
        newdata[0][cindx+1:cindx+1] = [&quot;Logo Name&quot;]
        for rcnt in range(1,len(newdata)):
            if newdata[rcnt][cindx] in logolookup:
                newdata[rcnt][cindx+1:cindx+1] = [logolookup[newdata[rcnt][cindx]]]
            else:
                newname = newdata[rcnt][cindx].split()[0]
                newdata[rcnt][cindx+1:cindx+1] = [newname]
                logolookup[newdata[rcnt][cindx]] = newname

    foodlookup = {'Caesar Salad':'Salad', 'Cheese Pizza':'Pizza',
                  'Cheeseburger':'Burger', 'Chocolate Sundae':'Dessert',
                  'Churro':'Snack', 'Hamburger':'Burger', 'Hot Dog':'HotDog',
                  'Pepperoni Pizza':'Pizza', 'Potato Chips':'Snack',
                  'Soda':'Drink'}
    if (&quot;Food Name&quot; in newdata[0]):
        cindx = newdata[0].index(&quot;Food Name&quot;)
        newdata[0][cindx+1:cindx+1] = [&quot;Food Category&quot;]
        for rcnt in range(1,len(newdata)):
            if newdata[rcnt][cindx] in foodlookup:
                newdata[rcnt][cindx+1:cindx+1] = [foodlookup[newdata[rcnt][cindx]]]
            else:
                newdata[rcnt][cindx+1:cindx+1] = ['UNDEFINED']

    rowcnt = len(newdata)
    colcnt = len(newdata[0])
    wsnew = wb.Sheets.Add()
    wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(rowcnt,colcnt)).Value = newdata
    wsnew.Columns.AutoFit()

    src = &quot;%s!R1C1:R%dC%d&quot;%(wsnew.Name,rowcnt,colcnt)

    # What were the total sales in each of the last four quarters?
    addpivot(wb,src,
             title=&quot;Sales by Quarter&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # What are the sales for each food item in each quarter?
    addpivot(wb,src,
             title=&quot;Sales by Food Item&quot;,
             filters=(),
             columns=(&quot;Food Name&quot;,),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # Who were the top 10 customers for ABCD Catering in 2009?
    addpivot(wb,src,
             title=&quot;Top 10 Customers&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Company Name&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Company Name&quot;,win32c.xlDescending))

    # Who was the highest producing sales rep for the year?
    addpivot(wb,src,
             title=&quot;Top Sales Reps&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Sales Rep Name&quot;,&quot;Company Name&quot;),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Sales Rep Name&quot;,win32c.xlDescending))

    # What food item had the highest unit sales in Q4?
    ptname = addpivot(wb,src,
             title=&quot;Unit Sales by Food&quot;,
             filters=(&quot;Fiscal Quarter&quot;,),
             columns=(),
             rows=(&quot;Food Name&quot;,),
             sumvalue=&quot;Sum of Quantity&quot;,
             sortfield=(&quot;Food Name&quot;,win32c.xlDescending))
    wb.Sheets(&quot;Unit Sales by Food&quot;).PivotTables(ptname).PivotFields(&quot;Fiscal Quarter&quot;).CurrentPage = &quot;2009-Q4&quot;

    # What food category had the highest unit sales in Q4?
    ptname = addpivot(wb,src,
             title=&quot;Unit Sales by Food Category&quot;,
             filters=(&quot;Fiscal Quarter&quot;,),
             columns=(),
             rows=(&quot;Food Category&quot;,),
             sumvalue=&quot;Sum of Quantity&quot;,
             sortfield=(&quot;Food Category&quot;,win32c.xlDescending))
    wb.Sheets(&quot;Unit Sales by Food Category&quot;).PivotTables(ptname).PivotFields(&quot;Fiscal Quarter&quot;).CurrentPage = &quot;2009-Q4&quot;

    if int(float(excel.Version)) &gt;= 12:
        wb.SaveAs('newABCDCatering.xlsx',win32c.xlOpenXMLWorkbook)
    else:
        wb.SaveAs('newABCDCatering.xls')
    excel.Application.Quit()

if __name__ == &quot;__main__&quot;:
    runexcel()
</pre>
<p><strong>Prerequisites</strong><br />
Python (refer to <a href="http://www.python.org">http://www.python.org</a>)</p>
<p>Win32 Python module (refer to <a href="http://sourceforge.net/projects/pywin32">http://sourceforge.net/projects/pywin32</a>)</p>
<p>Microsoft Excel (refer to <a href="http://office.microsoft.com/excel">http://office.microsoft.com/excel</a>)</p>
<p><strong>Source Files and Scripts</strong><br />
Source for the program erppivotextended.py and spreadsheet file ABCDCatering.xls are available at<br /><a href="http://github.com/pythonexcels/examples">http://github.com/pythonexcels/examples</a></p>
<p>Thanks &#8212; Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythonexcels.com/2009/12/extending-pivot-table-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Automating Pivot Tables with Python</title>
		<link>http://www.pythonexcels.com/2009/11/automating-pivot-tables-with-python/</link>
		<comments>http://www.pythonexcels.com/2009/11/automating-pivot-tables-with-python/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 03:49:01 +0000</pubDate>
		<dc:creator>dan</dc:creator>
				<category><![CDATA[ERP]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.pythonexcels.com/?p=237</guid>
		<description><![CDATA[In the last post I explained the basic concept behind Pivot Tables and provided some examples.  Pivot tables are an easy-to-use tool to derive some basic business intelligence from your data.  As discussed last time, there are occasions when you&#8217;ll need to do interactive data mining by changing column and row fields.  [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.pythonexcels.com/2009/11/introducing-pivot-tables">last post</a> I explained the basic concept behind Pivot Tables and provided some examples.  Pivot tables are an easy-to-use tool to derive some basic business intelligence from your data.  As discussed last time, there are occasions when you&#8217;ll need to do interactive data mining by changing column and row fields.  But in my experience, it&#8217;s handy to have my favorite reports built automatically, with the reports ready to go as soon as I open the spreadsheet.  In this post I&#8217;ll develop and explain the code to create a set of pivot tables automatically in worksheet. </p>
<p>The goal of this exercise is to automate the generation of pivot tables from the last post, and save them to a new Excel file.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091123_reports.png" alt="20091123_reports" title="20091123_reports" width="550" height="655" class="alignnone size-full wp-image-242" /></p>
<p>I started with the file <code>newABCDCatering.xls</code> from the previous post and record the macro to create this simple pivot table showing Net Bookings by Sales Rep and Food Name for the last four quarters.<br />
<img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091123_setup.png" alt="20091123_setup" title="20091123_setup" width="550" height="396" class="alignnone size-full wp-image-238" /></p>
<p>Captured in Excel 2007, the recorded macro looks like this: </p>
<pre class="brush: vb;">
Sub Macro1()
'
' Macro1 Macro
'

'
    Selection.CurrentRegion.Select
    Sheets.Add
    ActiveWorkbook.PivotCaches.Create(SourceType:=xlDatabase, SourceData:= _
        &quot;Sheet2!R1C1:R791C13&quot;, Version:=xlPivotTableVersion10).CreatePivotTable _
        TableDestination:=&quot;Sheet3!R3C1&quot;, TableName:=&quot;PivotTable1&quot;, DefaultVersion _
        :=xlPivotTableVersion10
    Sheets(&quot;Sheet3&quot;).Select
    Cells(3, 1).Select
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Fiscal Year&quot;)
        .Orientation = xlPageField
        .Position = 1
    End With
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Fiscal Quarter&quot;)
        .Orientation = xlColumnField
        .Position = 1
    End With
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Sales Rep Name&quot;)
        .Orientation = xlRowField
        .Position = 1
    End With
    With ActiveSheet.PivotTables(&quot;PivotTable1&quot;).PivotFields(&quot;Food Name&quot;)
        .Orientation = xlRowField
        .Position = 2
    End With
    ActiveSheet.PivotTables(&quot;PivotTable1&quot;).AddDataField ActiveSheet.PivotTables( _
        &quot;PivotTable1&quot;).PivotFields(&quot;Net Booking&quot;), &quot;Sum of Net Booking&quot;, xlSum
End Sub
</pre>
<p>The post <a href="http://www.pythonexcels.com/2009/10/mapping-excel-vb-macros-to-python/">Mapping Excel VB Macros to Python</a> covered a technique for recording a Visual Basic macro and porting it to Python.  Using that approach, you could simply turn on the macro recorder and generate all the required tables, producing a long script with lots of redundancy.  A better approach is to build a general purpose function that can be used over and over to generate the pivot tables.</p>
<p>Looking at the macro, you see lines specifying the <code>Orientation</code> of the field name, such as <code>.Orientation = xlRowField</code> and <code>.Orientation = xlColumnField</code>.  A pivot table has four basic areas for fields: </p>
<ul>
<li>Report Filter (<code>.Orientation = xlPageField</code>)</li>
<li>Column area (<code>.Orientation = xlColumnField</code>)</li>
<li>Row area (<code>.Orientation = xlRowField</code>)</li>
<li>Values area (<code>PivotTables().AddDataField()</code>)</li>
</ul>
<p>Each of these supports multiple fields (column fields for <code>Sales Rep Name</code> and <code>Food Name</code> were added in the example).  The ordering of the fields changes the appearance of the table.  </p>
<p>A general pattern should be apparent in this macro.  First, the pivot table is created with the <code>ActiveWorkbook.PivotCaches.Create()</code> statement.  Next, the columns and rows are configured with a series of <code>ActiveSheet.PivotTables("PivotTable1").PivotFields()</code> statements.  Finally, the field used in the <code>Values</code> section of the table is configured using the <code>ActiveSheet.PivotTables("PivotTable1").AddDataField</code> statement.  The general purpose function will need to contain all of these constructs.  Note the parts that can&#8217;t be hard-coded: the source of the data, <code>"Sheet2!R1C1:R791C13"</code>, and destination for the table, <code>"Sheet3!R3C1"</code> need to be determined based on the characteristics of the source data and can&#8217;t be hard coded in the general solution.</p>
<p>In Python, this pattern can be reduced to the following loop that covers fields for the Report Filter, Columns and Rows:</p>
<pre class="brush: python;">
def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=&quot;&quot;):
    &quot;&quot;&quot;Build a pivot table using the provided source location data
    and specified fields
    &quot;&quot;&quot;
    ...
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
        wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1
    ...
</pre>
<p>Processing the Values field is more or less copied from the Visual Basic.  To keep things simple in this example, this code is limited to adding &#8220;Sum of&#8221; values only, and doesn&#8217;t handle other Summarize Value functions such as Count, Min, Max, etc.</p>
<pre class="brush: python;">
    wb.ActiveSheet.PivotTables(tname).AddDataField(
        wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
        sumvalue,
        win32c.xlSum)
</pre>
<p>The actual values for <code>filters</code>,<code>columns</code> and <code>rows</code> in the function are defined in the call to the function.  The complete function creates a new sheet within the workbook, then adds an empty pivot table to the sheet and builds the table using the field information provided.  For example, to answer the question: <i>What were the total sales in each of the last four quarters?</i>, the pivot table is built with the following call to the <code>addpivot</code> function:</p>
<pre class="brush: python;">
# What were the total sales in each of the last four quarters?
addpivot(wb,src,
         title=&quot;Sales by Quarter&quot;,
         filters=(),
         columns=(),
         rows=(&quot;Fiscal Quarter&quot;,),
         sumvalue=&quot;Sum of Net Booking&quot;,
         sortfield=())
</pre>
<p>which defines a pivot table using the row header &#8220;Fiscal Quarter&#8221; and data value &#8220;Sum of Net Booking&#8221;.  The title &#8220;Sales by Quarter&#8221; is used to name the sheet itself.</p>
<p>To make the output spreadsheet more understandable, the title parameter passed into the function and used as a title in each worksheet and as the tab name.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091123_titletabsbq.png" alt="20091123_titletabsbq" title="20091123_titletabsbq" width="389" height="363" class="alignnone size-full wp-image-243" /></p>
<p>The complete script is shown below.  Caveats:</p>
<ul>
<li>This script has been modified to run on both Excel 2007 and Excel 2003 and has been tested on those versions.</li>
<li>Adding pivot tables increases the size of the output Excel file, which can be mitigated by disabling caching of pivot table data.  Line 48 of the script contains the command <code>newsheet.PivotTables(tname).SaveData = False</code>, which has been commented out.  Uncommenting this command will reduce the size of the output Excel file, but will require that the pivot table be refreshed before use by clicking on Refresh Data on the PivotTable toolbar.</li>
</ul>
<pre class="brush: python;">
#
# erpdatapivot.py:
# Load raw EPR data, clean up header info and
# build 5 pivot tables
#
import win32com.client as win32
win32c = win32.constants
import sys
import itertools
tablecount = itertools.count(1)

def addpivot(wb,sourcedata,title,filters=(),columns=(),
             rows=(),sumvalue=(),sortfield=&quot;&quot;):
    &quot;&quot;&quot;Build a pivot table using the provided source location data
    and specified fields
    &quot;&quot;&quot;
    newsheet = wb.Sheets.Add()
    newsheet.Cells(1,1).Value = title
    newsheet.Cells(1,1).Font.Size = 16

    # Build the Pivot Table
    tname = &quot;PivotTable%d&quot;%tablecount.next()

    pc = wb.PivotCaches().Add(SourceType=win32c.xlDatabase,
                                 SourceData=sourcedata)
    pt = pc.CreatePivotTable(TableDestination=&quot;%s!R4C1&quot;%newsheet.Name,
                             TableName=tname,
                             DefaultVersion=win32c.xlPivotTableVersion10)
    wb.Sheets(newsheet.Name).Select()
    wb.Sheets(newsheet.Name).Cells(3,1).Select()
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1

    wb.ActiveSheet.PivotTables(tname).AddDataField(
        wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
        sumvalue,
        win32c.xlSum)
    if len(sortfield) != 0:
        wb.ActiveSheet.PivotTables(tname).PivotFields(sortfield[0]).AutoSort(sortfield[1], sumvalue)
    newsheet.Name = title

    # Uncomment the next command to limit output file size, but make sure
    # to click Refresh Data on the PivotTable toolbar to update the table
    # newsheet.PivotTables(tname).SaveData = False

    return tname

def runexcel():
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    #excel.Visible = True
    try:
        wb = excel.Workbooks.Open('ABCDCatering.xls')
    except:
        print &quot;Failed to open spreadsheet ABCDCatering.xls&quot;
        sys.exit(1)
    ws = wb.Sheets('Sheet1')
    xldata = ws.UsedRange.Value
    newdata = []
    for row in xldata:
        if len(row) == 13 and row[-1] is not None:
            newdata.append(list(row))
    lasthdr = &quot;Col A&quot;
    for i,field in enumerate(newdata[0]):
        if field is None:
            newdata[0][i] = lasthdr + &quot; Name&quot;
        else:
            lasthdr = newdata[0][i]
    rowcnt = len(newdata)
    colcnt = len(newdata[0])
    wsnew = wb.Sheets.Add()
    wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(rowcnt,colcnt)).Value = newdata
    wsnew.Columns.AutoFit()

    src = &quot;%s!R1C1:R%dC%d&quot;%(wsnew.Name,rowcnt,colcnt)

    # What were the total sales in each of the last four quarters?
    addpivot(wb,src,
             title=&quot;Sales by Quarter&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # What are the sales for each food item in each quarter?
    addpivot(wb,src,
             title=&quot;Sales by Food Item&quot;,
             filters=(),
             columns=(&quot;Food Name&quot;,),
             rows=(&quot;Fiscal Quarter&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=())

    # Who were the top 10 customers for ABCD Catering in 2009?
    addpivot(wb,src,
             title=&quot;Top 10 Customers&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Company Name&quot;,),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Company Name&quot;,win32c.xlDescending))

    # Who was the highest producing sales rep for the year?
    addpivot(wb,src,
             title=&quot;Top Sales Reps&quot;,
             filters=(),
             columns=(),
             rows=(&quot;Sales Rep Name&quot;,&quot;Company Name&quot;),
             sumvalue=&quot;Sum of Net Booking&quot;,
             sortfield=(&quot;Sales Rep Name&quot;,win32c.xlDescending))

    # What food item had the highest unit sales in Q4?
    ptname = addpivot(wb,src,
             title=&quot;Unit Sales by Food&quot;,
             filters=(&quot;Fiscal Quarter&quot;,),
             columns=(),
             rows=(&quot;Food Name&quot;,),
             sumvalue=&quot;Sum of Quantity&quot;,
             sortfield=(&quot;Food Name&quot;,win32c.xlDescending))
    wb.Sheets(&quot;Unit Sales by Food&quot;).PivotTables(ptname).PivotFields(&quot;Fiscal Quarter&quot;).CurrentPage = &quot;2009-Q4&quot;

    if int(float(excel.Version)) &gt;= 12:
        wb.SaveAs('newABCDCatering.xlsx',win32c.xlOpenXMLWorkbook)
    else:
        wb.SaveAs('newABCDCatering.xls')
    excel.Application.Quit()

if __name__ == &quot;__main__&quot;:
    runexcel()
</pre>
<p><strong>Prerequisites</strong><br />
Python (refer to <a href="http://www.python.org">http://www.python.org</a>)</p>
<p>Microsoft Excel (refer to <a href="http://office.microsoft.com/excel">http://office.microsoft.com/excel</a>)</p>
<p><strong>Source Files and Scripts</strong><br />
Source for the program erpdatapivot.py and input spreadsheet file ABCDCatering.xls are available at<br /><a href="http://github.com/pythonexcels/examples">http://github.com/pythonexcels/examples</a></p>
<p>Thanks &#8212; Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythonexcels.com/2009/11/automating-pivot-tables-with-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introducing Pivot Tables</title>
		<link>http://www.pythonexcels.com/2009/11/introducing-pivot-tables/</link>
		<comments>http://www.pythonexcels.com/2009/11/introducing-pivot-tables/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 22:25:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Excel]]></category>

		<guid isPermaLink="false">http://www.pythonexcels.com/?p=208</guid>
		<description><![CDATA[A working knowledge of Microsoft Excel is now a prerequisite for just about every working professional using a computer on a regular basis.  But in my professional life, I've found few people who have a solid knowledge of pivot tables and are really comfortable using them in Excel.  This post introduces Pivot Tables and provides examples illustrating how they can be used to analyze corporate data. ]]></description>
			<content:encoded><![CDATA[<p>A working knowledge of Microsoft Excel is now a prerequisite for just about every working professional using a computer on a regular basis.  Excel training classes can be found in junior colleges, adult education centers, computer training centers, libraries, job search centers and of course online.  Microsoft Excel basics is even taught in high schools.  </p>
<p>But in my professional life, I&#8217;ve found few people who have a solid knowledge of pivot tables and are really comfortable using them in Excel.  If you aren&#8217;t aware of pivot tables or haven&#8217;t had the time to try out this function in Excel, pivot tables provide a way to cross tabulate, sort, segregate and aggregate tabular data, enabling you to quickly summarize data and extract totals, averages, and other information from the source data.  I first found out about pivot tables when I was working with our business unit financial analyst more than 10 years ago.  She didn&#8217;t like our corporate ERP system (SAP) any more than I did, and found it much faster to dump the raw data into Excel and get the answers to her questions by using a pivot table.  I&#8217;ve been a pivot table convert ever since.</p>
<p>Using the spreadsheet newABCDCatering.xls developed in the <a href="http://www.pythonexcels.com">last post</a>, let&#8217;s add a pivot table and answer the questions raised previously: </p>
<ul>
<li>What were the total sales in each of the last four quarters?</li>
<li>What are the sales for each food item in each quarter?</li>
<li>Who were the top 10 customers for ABCD Catering in 2009?</li>
<li>Who was the highest producing sales rep for the year?</li>
<li>What food item had the highest unit sales in Q4?</li>
</ul>
<p>To build the pivot table, begin by selecting the entire data table in the Sheet2 worksheet by clicking cell A1, and typing the Control-* key combination (hold down Ctrl and press *).  This selects the data in the table without also grabbing blank cells surrounding the table.  This is effectively the same as selecting cell A1 and scrolling to the last column and row of data while holding down the left mouse key.     </p>
<p>Next, if you&#8217;re using Excel 2007 or later, select the Insert tab then select PivotTable from the Pivot Table icon </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_makepivotxl12.png" alt="20091111_makepivotxl12" title="20091111_makepivotxl12" width="333" height="304" class="alignnone size-full wp-image-209" /></p>
<p>Because you&#8217;ve selected the spreadsheet data, the dialog should already be populated with the range <code>Sheet2!$A1:$M791<code> as shown below </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_createdialogxl12.png" alt="20091111_createdialogxl12" title="20091111_createdialogxl12" width="548" height="503" class="alignnone size-full wp-image-210" /></p>
<p>Click OK to create the empty pivot table.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_wizardxl12.png" alt="20091111_wizardxl12" title="20091111_wizardxl12" width="530" height="690" class="alignnone size-full wp-image-211" /></p>
<p>In Excel 2003 and earlier versions, select the table data as described above, then select <code>Data->Pivot Table and Pivot Chart Report</code> to create the pivot table.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_makepivotxl10.png" alt="20091111_makepivotxl10" title="20091111_makepivotxl10" width="528" height="200" class="alignnone size-full wp-image-212" /></p>
<p>You're presented with a three step wizard.  For now, just hit <code>Next</code> for the first two dialogs, then <code>Finish</code> at the final dialog.  </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_wizardxl10_1.png" alt="20091111_wizardxl10_1" title="20091111_wizardxl10_1" width="476" height="330" class="alignnone size-full wp-image-213" /><br />
<img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_wizardxl10_2.png" alt="20091111_wizardxl10_2" title="20091111_wizardxl10_2" width="398" height="119" class="alignnone size-full wp-image-214" /><br />
<img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_wizardxl10_3.png" alt="20091111_wizardxl10_3" title="20091111_wizardxl10_3" width="531" height="238" class="alignnone size-full wp-image-215" /></p>
<p>Once you've completed the above steps, you'll see the following displayed in older versions of Excel.<br />
<img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_pivotfieldlistxl10.png" alt="20091111_pivotfieldlistxl10" title="20091111_pivotfieldlistxl10" width="550" height="347" class="alignnone size-full wp-image-216" /></p>
<p>Now you're ready to do some data analysis. </p>
<p><strong>What were the total sales in each of the last four quarters?</strong></p>
<p>To understand the sales for the last four quarters, create a pivot table with "Fiscal Quarter" as a Row Label, and "Net Booking" as a Values field.  To do this, drag the field "Fiscal Quarter" to the Row Labels section, and "Net Booking" to the Values section.  (In older Excel versions, drag the "Fiscal Quarter" field directly onto the spreadsheet to the "Drop Row Fields Here" area, then drag the "Net Bookings" field onto the "Drop Data Items Here" area)</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_ptsetupxl12.png" alt="20091111_ptsetupxl12" title="20091111_ptsetupxl12" width="336" height="814" class="alignnone size-full wp-image-217" /></p>
<p>Your spreadsheet should now look something like this: </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_salesbyqtrxl12.png" alt="20091111_salesbyqtrxl12" title="20091111_salesbyqtrxl12" width="530" height="705" class="alignnone size-full wp-image-218" /></p>
<p>The header for the table data should say "Sum of Net Bookings". If it doesn't, double click on the header text and select "Sum" in the list box "Summarize value field by", or right mouse click over the text and select <code>Summarize Data By->Sum</code>.  </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_setsumxl12.png" alt="20091111_setsumxl12" title="20091111_setsumxl12" width="533" height="525" class="alignnone size-full wp-image-219" /><br />
<img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_setsum2xl12.png" alt="20091111_setsum2xl12" title="20091111_setsum2xl12" width="570" height="465" class="alignnone size-full wp-image-220" /></p>
<p>Based on the spreadsheet data, the total net bookings in each of the last four quarters were $83465, $77180, $79605 and $77440 respectively.  </p>
<p><strong>What are the sales for each food item in each quarter?</strong></p>
<p>To answer this question, we need the same fields as setup previously (Fiscal Quarter as a Row Label, Sum of Net Booking as a Value field), plus a column header for "Food Name".  Remember that "Food" represents the numerical identifier for each food item, and "Food Name" contains the text description.  Drag "Food Name" to the Column Labels section (in older versions of Excel, drag it to the "Drop Column Fields Here" area).  The spreadsheet should now look like this: </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_salesbyfooditemxl12.png" alt="20091111_salesbyfooditemxl12" title="20091111_salesbyfooditemxl12" width="550" height="592" class="alignnone size-full wp-image-221" /></p>
<p>Note that each food item is listed as a column header, each of the four quarters are listed as row headers.  Using this table you can quickly scan the data and understand the sales for each food item.  For example, Caesar Salad sales were $7890, $7140, $7960 and $6990 in each of the respective quarters. </p>
<p><strong>Who were the top 10 customers for ABCD Catering in 2009?</strong></p>
<p>Again, Sum of Net Bookings is the data value, but we no longer need the Food Name or Fiscal Quarter data fields.  Remove them by selecting them in the Row Labels or Column Labels boxes and dragging them back to the top, or by clicking the small triangle and selecting "Remove Field".</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_removefieldxl12.png" alt="20091111_removefieldxl12" title="20091111_removefieldxl12" width="239" height="455" class="alignnone size-full wp-image-222" /></p>
<p>In Excel 2003 and earlier versions, select the column or row header and drag it back into the Field Chooser widget.</p>
<p>Now, add the Company Name field to the table by dragging it to the Row Labels box (or "Drop Row Fields Here" area in older Excel).  The pivot table now contains the list of companies and their purchases, listed in alphabetical order.  To find the top 10 customers, select the booking number for Adobe Systems, right click and select <code>Sort->Sort Largest to Smallest</code></p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_sortxl12.png" alt="20091111_sortxl12" title="20091111_sortxl12" width="550" height="387" class="alignnone size-full wp-image-223" /></p>
<p>In older versions of Excel, select a booking number and click the "Sort Descending" icon in the tool bar, or select "Data->Sort" from the menu and select the descending sort.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_sortdesc.png" alt="20091111_sortdesc" title="20091111_sortdesc" width="436" height="234" class="alignnone size-full wp-image-224" /></p>
<p>The list is now sorted, the top 10 customers for ABCD Catering are Hewlett-Packard,<br />
Intel, Oracle, Cisco Systems, Sanmina SCI, Sun Microsystems, Apple,<br />
Con-Way, eBay and Yahoo.  </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_top10.png" alt="20091111_top10" title="20091111_top10" width="350" height="369" class="alignnone size-full wp-image-225" /></p>
<p><strong>Who was the highest producing sales rep for the year?</strong></p>
<p>At ABCD Catering, sales reps cover multiple accounts.  To find the highest producing rep, remove the Company Name field, replace it with the Sales Rep Name field and sort by Net Bookings.  The top 10 sales reps are Dave Davidson, Lin Linares, Carl Carlson, Kay Kaywood and Nicole Nichols. </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_top10reps.png" alt="20091111_top10reps" title="20091111_top10reps" width="554" height="779" class="alignnone size-full wp-image-226" /></p>
<p>What accounts are these top reps responsible for?  To find out, drag the Company Name field into the Row Labels area.  </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_top10repsaccts.png" alt="20091111_top10repsaccts" title="20091111_top10repsaccts" width="642" height="777" class="alignnone size-full wp-image-227" /></p>
<p>In older versions of Excel, drag Company Name directly onto the table</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_top10repsacctsxl10.png" alt="20091111_top10repsacctsxl10" title="20091111_top10repsacctsxl10" width="419" height="362" class="alignnone size-full wp-image-228" /></p>
<p>Since Hewlett-Packard, Intel and Cisco Systems were 3 of the top 4 producing accounts, it's no surprise that their sales rep Dave Davidson was the top performer.  </p>
<p><strong>What food item had the highest unit sales in Q4?</strong></p>
<p>To find the food item with the highest unit sales, change the data value field to Sum of Quantity by removing Sum of Net Bookings, adding Quantity and making sure the Value Field Setting is "Sum" and not some other setting.  Next, remove the Sales Rep Name and Company Name row header fields and replace them with Food Name.  To limit the data to the Q4 quarter, drag the Fiscal Quarter field to the Report Filter area, and select "2009-Q4".<br />
<img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_quarterfilterxl12.png" alt="20091111_quarterfilterxl12" title="20091111_quarterfilterxl12" width="276" height="389" class="alignnone size-full wp-image-229" /></p>
<p>Finally, do a descending sort on the Sum of Quantity field to find the item with the highest unit sales.  </p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/11/20091111_highestunit.png" alt="20091111_highestunit" title="20091111_highestunit" width="270" height="393" class="alignnone size-full wp-image-230" /></p>
<p>The number one item by unit volume was Potato Chips, followed by Soda and Churro.  </p>
<p>Hopefully this gives you a feel for the power and flexibility of pivot tables.  In the next post, we'll automate everything with Python and generate a simple framework for quickly building pivot tables.  </p>
<p><strong>Prerequisites</strong></p>
<p>Microsoft Excel (refer to <a href="http://office.microsoft.com/excel">http://office.microsoft.com/excel</a>)</p>
<p><strong>Source Files and Scripts</strong><br />
The spreadsheet newABCDCatering.xls is available at<br /><a href="http://github.com/pythonexcels/excelexamples/tree/master">http://github.com/pythonexcels/excelexamples/tree/master</a></p>
<p>Thanks --- Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythonexcels.com/2009/11/introducing-pivot-tables/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Cleaning Up Corporate ERP Data</title>
		<link>http://www.pythonexcels.com/2009/11/cleaning-up-corporate-erp-data/</link>
		<comments>http://www.pythonexcels.com/2009/11/cleaning-up-corporate-erp-data/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 05:50:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[ERP]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.pythonexcels.com/?p=191</guid>
		<description><![CDATA[The previous posts have used Excel and Python to create and manipulate small spreadsheets.  In reality, Python and Excel are especially well suited to tackling large data sets.  This post will illustrate some techniques for cleaning up data downloaded from corporate ERP systems such as SAP and Oracle, and getting it ready for some serious data mining with Excel.]]></description>
			<content:encoded><![CDATA[<p>The previous posts have used Excel and Python to create and manipulate small spreadsheets.  In reality, Python and Excel are especially well suited to tackling large data sets.  This post will illustrate some techniques for cleaning up data downloaded from corporate ERP systems such as SAP and Oracle, and getting it ready for some serious data mining with Excel.</p>
<p>In this example, a fictional company called ABCD Catering has recorded sales and order history for 2009 in their corporate ERP system.  ABCD Catering provides catering services to leading Silicon Valley companies, providing the best in hamburgers, hot dogs, churros, sodas and other comfort food.  Your boss has asked you to examine this data and answer some questions and produce charts representing some of the data:</p>
<ul>
<li>What were the total sales in each of the last four quarters?</li>
<li>What are the sales for each food item in each quarter?</li>
<li>Who were the top 10 customers for ABCD catering in Q1?</li>
<li>Who was the highest producing sales rep for the year?</li>
<li>What food item had the highest unit sales in Q4?</li>
</ul>
<p>Generating this information typically involves running five separate reports in the system.  Since your boss is looking for this same information at the end of each quarter, you want to simplify your life and your bosses by automating the report.  Using Python and Excel, you can download a spreadsheet copy of the raw data, process it, generate the key figures and charts and save them to a spreadsheet.</p>
<p>Take a look at the data in ABCDCatering.xls:</p>
<p><img class="alignnone size-full wp-image-192" title="20091102_original" src="http://www.pythonexcels.com/wp-content/uploads/2009/10/20091102_original.png" alt="20091102_original" width="550" height="255" /></p>
<p>The spreadsheet contains some header information, then a large table of records for each order.  Each record contains the fiscal year and quarter, food item, company name, order data, sales representative, booking and order quantity for each order.  The data needs some work before you can use it in a pivot table.  First, the data in rows 1 through 11 must be ignored, it&#8217;s meaningless for the pivot table.  Also, some columns do not have a proper header and must be corrected before the data can be used.  The good news is that after some minor massaging, this data will be ideally suited for processing with a pivot table in Excel.  Close the spreadsheet and get ready to build the reports.</p>
<p>The program begins with the standard boilerplate: import the win32 module and start Excel.  If you have questions on this, please refer to the <a href="http://www.pythonexcels.com/2009/09/basic-excel-driving-with-python">earlier</a> <a href="http://www.pythonexcels.com/2009/10/python-excel-mini-cookbook">posts</a>.</p>
<pre class="brush: python;">
#
# erpdata.py: Load raw EPR data and clean up header info
#
import win32com.client as win32
import sys
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = True
</pre>
<p>Next, open the spreadsheet ABCDCatering.xls with some exception handling.  The <code>try/except</code> clause attempts to open the file with the <code>Workbooks.Open()</code> method, and exits gracefully if the file is missing or some other problem occurred.  Lastly, the variable <code>ws</code> is set to the spreadsheet containing the data.</p>
<pre class="brush: python; first-line: 8">
try:
    wb = excel.Workbooks.Open('ABCDCatering.xls')
except:
    print "Failed to open spreadsheet ABCDCatering.xls"
    sys.exit(1)
ws = wb.Sheets('Sheet1')
</pre>
<p>An easy way to load the entire spreadsheet into Python is the <code>UsedRange</code> method.  The following command:</p>
<pre class="brush: python; first-line: 14">
xldata = ws.UsedRange.Value
</pre>
<p>grabs all the data in the Sheet1 worksheet and copies it into a tuple named <code>xldata</code>.  Once inside Python, the data can be manipulated and placed back into the spreadsheet with minimal calls to the COM interface, resulting in faster, more efficient processing.</p>
<p>To delete rows, add columns and do other operations on the data, it must be converted to or copied to a list.  The approach used here is to examine the data row by row, discarding the non essential header rows and copying everything else to a new list.  The first step is to remove the rows that are not part of the column header row or record data.  If you are using Python to generate the program interactively, you can investigate the data in the <code>xldata</code> tuple and display the data for the first record (<code>xldata[0]</code>) and header record (<code>xldata[11]</code>):</p>
<p><img class="alignnone size-full wp-image-193" title="20091102_xldata0" src="http://www.pythonexcels.com/wp-content/uploads/2009/10/20091102_xldata0.png" alt="20091102_xldata0" width="570" height="132" /></p>
<p>The length of both rows is 13, though <code>xldata[0]</code> contains many elements with a value of <code>None</code>.  The following code checks the length of the data and skips any rows shorter then 13 fields or rows that contain <code>None</code> in the last field.  Note that this code assumes that the actual data in the table always contains complete records, true in this dataset but you should always understand the characteristics of the data you&#8217;re working on.</p>
<pre class="brush: python; first-line: 15">
newdata = []
for row in xldata:
    if row[-1] is not None and len(row) == 13:
        newdata.append(row)
</pre>
<p>The <code>newdata</code> list now contains the header and data rows from the spreadsheet, but the header row is still not complete.  All column headers must contain text in order to use this data in a pivot table.  Unfortunately, the spreadsheet downloads produced by the ERP system have the column label over the numberical identifier for the item, while the text column header is blank.  You can see that for the &#8220;Food&#8221; and &#8220;Company&#8221; data below.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/10/20091102_foodcompany.png" alt="20091102_foodcompany" title="20091102_foodcompany" width="454" height="216" class="alignnone size-full wp-image-194" /></p>
<p>One approach that works for this data is to scan the header and insert a column header based on the contents of the previous column.  For example, the label for column F could be &#8220;Company Name&#8221;, created by simply appending the text &#8221; Name&#8221; to the column header &#8220;Company&#8221; from the prior column.  Using this simple algorithm, the column header row can be filled out and the spreadsheet made ready for pivot table conversion.  A more complex lookup could be used as well, but the simple algorithm described here will scale if new fields are added to the report.</p>
<pre class="brush: python; first-line: 19">lasthdr = "Col A"
for i,field in enumerate(newdata[0]):
  if field is None:
    newdata[0][i] = lasthdr + " Name"
  else:
    lasthdr = newdata[0][i]
</pre>
<p>Now the data is ready for insertion back into the spreadsheet.  To enable comparison between the new data set and the original, create a new sheet in the workbook, write the data to the new sheet and autofit the columns.</p>
<pre class="brush: python; first-line: 25">
wsnew = wb.Sheets.Add()
wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(len(newdata),len(newdata[0]))).Value = newdata
wsnew.Columns.AutoFit()
</pre>
<p>The last step is to save the worksheet to a new file and quit Excel.  The Excel version is checked in order to save the data in the correct spreadsheet format.  Version 12 corresponds to Excel 2007, which uses the <code>.xlsx</code> file extension.  You also have to specify the constant <code>xlOpenXMLWorkbook</code> to define the type of output Excel file. Earlier version of Excel use the <code>.xls</code> extension, and because the input file was .xls format, no output format specifier is needed for users of older versions of Excel.</p>
<pre class="brush: python; first-line: 28">
if int(float(excel.Version)) >= 12:
    wb.SaveAs('newABCDCatering.xlsx',win32.constants.xlOpenXMLWorkbook)
else:
    wb.SaveAs('newABCDCatering.xls')
excel.Application.Quit()
</pre>
<p>If the file <code>newABCDCatering.xlsx</code> or <code>newABCDCatering.xls</code> already exists in My Documents, you will see the following popup when you run the script.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/10/20091102_existspopup.png" alt="20091102_existspopup" title="20091102_existspopup" width="525" height="125" class="alignnone size-full wp-image-195" /></p>
<p>Click &#8220;Yes&#8221; to overwrite the spreadsheet file.   To run the script cleanly, erase the file <code>newABCDCatering.xlsx</code> or <code>newABCDCatering.xls</code> and try the script again.</p>
<p>After running the script, open the file newABCDCatering.xlsx or newABCDCatering.xls and view the contents.  Note that the extraneous header information has been removed and blank column header information has been inserted programmatically as described earlier.</p>
<p><img src="http://www.pythonexcels.com/wp-content/uploads/2009/10/20091102_exceloutput.png" alt="20091102_exceloutput" title="20091102_exceloutput" width="588" height="228" class="alignnone size-full wp-image-196" /></p>
<p>The new spreadsheet is ready for use in a pivot table, which will be covered in the next post.   Here is the complete script, also available at <a href="http://github.com/pythonexcels/examples">github</a>.</p>
<pre class="brush: python;">
#
# erpdata.py: Load raw EPR data and clean up header info
#
import win32com.client as win32
import sys
excel = win32.gencache.EnsureDispatch('Excel.Application')
#excel.Visible = True
try:
    wb = excel.Workbooks.Open('ABCDCatering.xls')
except:
    print &quot;Failed to open spreadsheet ABCDCatering.xls&quot;
    sys.exit(1)
ws = wb.Sheets('Sheet1')
xldata = ws.UsedRange.Value
newdata = []
for row in xldata:
    if len(row) == 13 and row[-1] is not None:
        newdata.append(list(row))
lasthdr = &quot;Col A&quot;
for i,field in enumerate(newdata[0]):
    if field is None:
        newdata[0][i] = lasthdr + &quot; Name&quot;
    else:
        lasthdr = newdata[0][i]
wsnew = wb.Sheets.Add()
wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(len(newdata),len(newdata[0]))).Value = newdata
wsnew.Columns.AutoFit()
if int(float(excel.Version)) &gt;= 12:
    wb.SaveAs('newABCDCatering.xlsx',win32.constants.xlOpenXMLWorkbook)
else:
    wb.SaveAs('newABCDCatering.xls')
excel.Application.Quit()
</pre>
<p><strong>Prerequisites</strong><br />
Python (refer to <a href="http://www.python.org">http://www.python.org</a>)</p>
<p>Win32 Python module (refer to <a href="http://sourceforge.net/projects/pywin32">http://sourceforge.net/projects/pywin32</a>)</p>
<p>Microsoft Excel (refer to <a href="http://office.microsoft.com/excel">http://office.microsoft.com/excel</a>)</p>
<p><strong>Source Files and Scripts</strong><br />
Source for the program erpdata.py and spreadsheet file ABCDCatering.xls are available at<br />
<a href="http://github.com/pythonexcels/examples">http://github.com/pythonexcels/examples</a></p>
<p>Thanks &#8212; Dan</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythonexcels.com/2009/11/cleaning-up-corporate-erp-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
