I appreciate Kim's defense and wanted to add on some 'real life' experiences.
I have used Excel extensively, by necessity. I work in healthcare and spent many years at a company that would not support buying a high end package and I would receive all matter of data in jumbled messes of Excel spreadsheets. I'm an experimental physiologist by training and a 'data scientist' by choice. The '' used to explicitly point out I'm self taught post PhD and that was done on-the-fly MacGuyver style: Using the tools I had available with the additional constraints levied by an over-protective CIO.
What I'm trying to illustrate is that I had a highly demanding job that included software development, clinical education, payor relations, and on, and on, and on...oh, and per the company, I was "the numbers guy", the "data guru", the "stats nerd". The potential need for a clinical psychologist to discuss my feelings about that environment aside, I want to point out a couple good things that came of it.
Kim's comment on Pivot Tables is spot on. They "saved me" as I used them to aggregate data. Once you format the root file, index and tag records, you can roll-up the data quickly in a Pivot Table, then copy/paste special -> values and number formats. That creates a new flat file. If you do this across multiple spreadsheets (because that's how I usually received the data) you can then work within the constraints of the absolute number of references. When I first learned this tool Excel could only handle a minimal amount of records (~65k), but the ~07 update allowed just over 1M. Thus, while I'm sure nearly (if not all) readers will cringe at that disclosure, Excel was my data manipulation tool.
On that note: Array Formulas
Sweet Baby Jesus did those bail me out. Again, they were much more useful in the older versions because with the '07 update many of the complex formulas I needed array formulas for were included (e.g., SUMIF), but they still have a place.
So my main point here is for anyone like me who came to the gun fight with a knife and handcuffs there are many things you CAN do in Excel. Please note the emphasis on CAN vs SHOULD. Chris Rock once said "you can drive a car with your feet if you want to, that don't make it a good F-in idea". However, if you need to deal with this reality here are a few tips and resources.
Analysts are your friends: mrexcel.com (and many other community pages)
They live in this handcuffed world and have to troubleshoot files all the time. They are not trained with tools beyond what they learned in business school, or like me, don't (didn't) have access to them and/or the time to actually learn them. They are treading water and trying to put the kids thru school.
1. Step 1: Create a "Notes" tab (or name it LOG). You'll be manually journaling your work, but for someone learning this is actually quite a good exercise. Reminds me of when I took a physics course and the instructor disallowed calculators. He taught us to estimate, round, and work with nothing more than a pencil and paper. You'll be doing the e-version of that here.
2. Don't leave formulas embedded in the data you process (this bloats the file and prevents further action on data aggregated in pivot tables). Copy, paste special -> values and number formats. Before you do that, copy your formulas and paste them to your log to keep a record. This is how you can error check when you get to a 'common sense' validation step later. You can also build a library of formulas and use them in future files, merely amending them to suit the needs of that dataset.
3. use the .xlsb format. It's the most compressed file type and will also prevent bloat. Excel will 'sh!t the bed' at some point if you just keep adding new spreadsheets. Be mindful of the file size. When it grows, move the data to a new file.
4. Analysis: you will be coding by hand, which is useful for learning. Sure, it's inefficient, but so is having this real world constraint you're living with. I suggest bourbon to cope, YMMV. Expert tip: I find wine is more useful WHILE working.
5. Take advantage of the pivot table formatting that is available. If your users are, as I expect, business end-users, they will only be familiar with Excel. Build interactive tables (i.e., dashboards). Use the page field to allow them to pivot the data and look at it the way they are thinking about the problem, which is usually "How am I going to have this next meeting with Decision Maker X?"
Those are the high points I can think of off the cuff. Email me if you have specific questions. I wrote tutorials so I could push down the methodology to analysts. I have datasets I can de-identify and share and would be happy to.
Jamie
stark.jamie@gmail.com
------------------------------
Jamie Stark
Owner
Sum Integral
Original Message:
Sent: 08-31-2016 12:51
From: Kimberly Love
Subject: Excel's user unfriendliness
All, I'm going to defend Excel just a little bit :)
I have actually taught a workshop on exploratory data analysis in Excel to several groups of folks, and I think as long as you are not working with very large data sets where managing it visually is of no help at all, there are some neat things Excel does that you can take advantage of. Note that I tend to work more with social scientists and academics from other areas who do not have extensive statistical background, so this works for many of my clients.
1) Using filters can be really helpful for a quick look at your data. Turn on Filters (Quick start: Filter data by using an AutoFilter - Excel is for Excel 2010 but very similar in other versions of Office for Windows). I then click the filter variable next to each variable to get an idea of the range of each variable, and can do some quick data cleaning if I see values that just don't belong.
2) Pivot tables are great once you learn how to use them! They also make it much easier to make graphs based on your data (I find that it is really hard to make graphs/figures "from scratch" in Excel, no matter how long I have worked with them). Create or delete a PivotTable or PivotChart report - Excel is again for Excel 2010, but similar to other versions.
3) The Data Analysis ToolPak is a helpful add in that can make quick summaries and even do very simple statistical analyses on the spot. Use the Analysis ToolPak to perform complex data analysis - Excel discusses this in more detail.
4) Some other things that it has been really useful for:
- Color coding cleaned data when I need to communicate data issues and I can't see my clients in person (true for me in many cases, and no statistical program allows this that I know of).
- Making tables look pretty when they come out of other statistical programs and I need to write up a report.
- Creating graphs and figures that my clients can edit themselves. I don't generally like making figures in Excel because I still find it clunky (unless you create them from pivot tables) but when my clients want to be able to edit their own figures it's extremely useful.
Some caveats...
- I've never worked with older versions of Excel for professional work, and I think that's just as well. My understanding is that any "advanced" functions were buggy and just plain awful--I think, like Word, it has come a long way for use in a professional environment.
- I don't ever do any final statistical analyses in Excel. I tend to use it more for quick exploration and data cleaning/manipulation, especially when I have a client sitting with me, but it's excellent for that.
If anyone is interested in more of what I teach in the Excel EDA workshop, just send me an email: kim@krloveqcc.com.
Thanks!
Kim
------------------------------
Kim Love
Owner and Lead Consultant
K. R. Love Quantitative Consulting & Collaboration
Original Message:
Sent: 08-29-2016 11:07
From: Joseph Locascio
Subject: Excel's user unfriendliness
Hi,
In reading some recent postings at ASA Connect about Excel spreadsheets, a pet peeve occurred to me that I thought I might mention. Are there other people out there who, like me, have always found Excel spreadsheets surprisingly & disappointingly user-unfriendly? When I first started experimenting with them years ago, I had the hopeful assumption that I could do a lot of data manipulation (& some basic stats & graphs) easily within Excel, but I found it so hard to use, that I now use it ONLY as a commonly employed method of data transmission, but as soon as I receive data in Excel, I export it out of Excel and into other software (usually SAS or JMP in my case) & do all my data processing within the other software.
For one thing, whenever you look at the bottom of the file in an Excel sheet to see how many observations (usually subjects) there are, i.e., how many rows there are, you must always remember that the number of rows is always one more than the actual number of observations because evidently the Excel developers have never thought to include an unnumbered (or zero numbered) row at the top of the sheet for the necessary function of writing the variable names. I find myself & other people I work with often being misled by that. On top of that, I find the awkward need to write formulas for simple functions, & also unwarned booby-traps like inadvertently sorting on a variable without realizing that none of the other data are being sorted with it.
Is there something I'm missing here, or have other people encountered these annoyances? Just curious. Thanks.
Joseph J. Locascio, Ph.D.,
Bio-Statistician,
Memory and Movement Disorders Units,
Massachusetts Alzheimer's Disease Research Center,
Neurology Dept.,
Massachusetts General Hospital (MGH),
Boston, Massachusetts 02114
Phone: (617) 724-7192
Email:
JLocascio@partners.org
"The information transmitted in this email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material from any computer."
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.