Some thoughts on choosing a stat package (in no special order).
A lot depends on what kind of work you are doing.
There is a prevailing idea that FDA requires use of SAS, even though that would be an illegal requirement. However, it can be easier to go along to get along.
If you are in a field where clean data is supplied, then then it makes less difference which package you use. If data is generated by people it can very messy and dealing with the mess is an important consideration.
Be sure that comparisons are based on current capabilities. For example, in the early 80s, default SAS would take about twice the machine resources to read a 1.5 million record file and do a single procedure as would default SPSS. Today machines resources are not as much of a consideration as they were then.
SAS would only run on IBM hardware. Today SAS runs on many platforms. I do not recall where but sometime in the last week I saw a claim on a discussion list that SPSS does not do mixed models. It does for many years. Likewise a few months ago I saw a statement on a discussion list that SPSS does not do general linear models, generalized linear models, or complex samples. Historically each of those statements were true at one time, but not for many years.
For packages that have modules, it is not necessary for every user to have access to all of the modules all of the time.
Many packages can write files in the format for other packages.
Beware of slanted views. I have heard people say that SPSS had to be useless because the very idea of social science is nonsense. I've heard "SAS is only for ag experiments".
Some packages can interface easily with other resources such as python or R. E.g., SPSS can call R. That way all the data prep can be done in SPSS and if one wants something less common one can use are without dealing with its data handling.
If the users and QA reviewers are strong readers of English, then readability of syntax is a big advantage.
If tasks are diverse, then similarity between procedure syntax is an advantage. That is less of a consideration if you just do one narrow type of statistics.
If you have large amounts of data, then a package that hold all data in RAM is a disadvantage.
In some disciplines it is not unusual for 95% of the labor hours using a package, in those circumstances human factors aspects are extremely important.
Ease of extensive labeling helps non-statisticians or beginners interpret results.
It can be useful for developing syntax to be able to distinguish whether (a) the value of a variable is missing because the system cannot follow your instructions, e.g, input does not match the format specified or the instructions call for unreasonable operations like square root of a negative number OR
(b) is missing for a known reason like this case was not selected for a sample vs refused to respond at all vs skipped this item, etc. In some fields, it is good practice to be sure that data contains no missing values for reasons that are not specified. I have found this very useful in developing syntax.
Packages that are strictly point and click can be great for small projects like small exercises. However they do not have the audit trail necessary for projects where quality review or maintainability are important. However, a GUI that can <paste> syntax can be useful for producing early drafts of syntax. In my experience, YMMV, cleaning, prepping, and analysis is an iterative process like any other writing. {I have even gone back an rephrased part of this post.} Whether you call it "an audit trail", "due diligence", "careful communication", or "CYA" availability of syntax is a major consideration.
When I have clients who want to distribute data for public use, I suggest that they have at least 1 seat that has SPSS which carries the most metadata in its system files (value labels for valid and missing values, level of measurement, variable labels etc. ) . Then completely fill in the variables view which shows the metadata. Then it is simple to "save as" SPSS, SAS, flat file, csv, etc. DISPLAY DICTIONARY provides documentation that can be used in conjunction packages that hold less metadata in their system files.
In my experience , YMMV, purchase cost is only a small part of total cost of ownership. Money and calendar time costs for time to write syntax, clean and prep the data, ease of QA review, create a solid audit trail vastly outweigh purchase cost as part of the long term total cost of operation.
There are strong reasons not to use EXCEL for actual statistics. See the links below. I do use it for small spreadsheet tasks for which it was designed. It can be useful as a data entry tool put the data together.
The following is from one of my soapbox posts, apologies to those of you who are tired of my making this point.
Using spreadsheets for stat can be like using a hammer to drive a screw.
IF I RECALL CORRECTLY any accounting system that uses computer spreadsheets cannot pass ISO certification. I believe this is at least partly due to the difficulties in tracing exactly what was done.
I do use Excel Pro or tables in WordPerfect several times a week.
Sometimes it is practical to enter data via a spreadsheet and then read it into a stat package e.g., SPSS, for checking double entry and quality checks. [This is because spread sheets are usually available, but it not cost effective to provide stat package to every person who will do data entry.]
For an overview of the issues:
A video can be found at
http://www.spss.com/events/event.cfm?E_ID=2921&Country=US A pdf can be found by clicking
<The Risks of Using Spreadsheets in Data Analysis>
For more technical review see.
http://www.pages.drexel.edu/~bdm25/excel-intro.pdf http://www.pages.drexel.edu/~bdm25/excel2007.pdf http://www.pages.drexel.edu/~bdm25/excel-rng.pdf an instructional link is
http://www.umass.edu/statdata/software/handouts/excel.html. -------------------------------------------
Arthur Kendall
Social Research Consultants
-------------------------------------------
Original Message:
Sent: 01-24-2014 13:23
From: Wayne Fischer
Subject: STATISTICA is good substitution for SAS
Well, since no one else has mentioned it, I will. I have been using JMP (from SAS Institute) for over two years now and am well pleased with it. Very versatile: very broad coverage of statistical analyses integrated with excellent graphics, combined with very good data handling capabilities. Can do various simulations, too.
Single user license lists at $1470, about half that for academic institutions. JMP Pro way more expensive...
-------------------------------------------
Wayne Fischer
Statistician
University of Texas Medical Branch
-------------------------------------------