Discussion: View Thread

Excel files with more than 300 columns

  • 1.  Excel files with more than 300 columns

    Posted 08-13-2013 12:57
    I'd appreciate suggestions preferably for open source software for reading in excel (or .csv) files with more than 300 columns.
    R (the read.csv function) and SAS (9.2) could not read in the file.
    • I'd also prefer not to split the file into smaller files.

    thanks in advance.

    -------------------------------------------
    Chris Barker, Ph.D.
    Publications  Officer - ASA Statistical Consulting Section
    &
    Adjunct Associate Professor of Biostatistics
    - University of Illinois Chicago


    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    -------------------------------------------


  • 2.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 13:05
    Chris, for what it is worth I have had no problem using SAS 9.2 to read in either Excel or csv files with > 300 fields of data.

    -------------------------------------------
    David Mangen
    Owner
    Mangen Research Associates, Inc.
    -------------------------------------------








  • 3.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 13:15


    -------------------------------------------
    Raymond Hoffmann
    Professor
    Medical College of Wisconsin
    -------------------------------------------
    One possibility is that you have a blank file/file name.
    That has lead to premature termination of the read of a csv for me.
    Also check that you don't have any bad/unusual characters;
    most of the time that just leads to an underscore,
    but could be a problem if it is bad enough.

    Ray




  • 4.  RE:Excel files with more than 300 columns

    Posted 08-14-2013 08:58
    When I read files of that type, I set LRECL as others have suggested, but I use LRECL=4096 (that is a power of two, but I don't remember why I chose that specific value).

    I then use SAS code such as I list below.  I tried it with 418 columns to verify it worked.  I am currently using SAS 9.3, but I've used similar code for (much) earlier versions of SAS.

     

    filename file1 dde 'excel'sheet1!r1c1:r1c418';

    data file;

      infile file1 dlm='09'x notab dsd missover lrecl=4096;

      input x1-x418;

    run;



    -------------------------------------------
    Michael Morton
    -------------------------------------------








  • 5.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 13:11
    What error message is SAS returning to you?  Is it possible that there is a Control-Z embedded in the data file?

    -------------------------------------------
    David Mangen
    Owner
    Mangen Research Associates, Inc.
    -------------------------------------------








  • 6.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 13:16

    In a case like this, you need to use some kind of range restriction to control what is read.

    You can restrict the rows and columns of the input spreadsheet using

    RANGE="sheetname$A3:J22";

    in the PROC IMPORT statement. This will enable you to ascertain where the problems are not.



    -------------------------------------------
    Paul Thompson
    Director, Methodology and Data Analysis Center
    Sanford Research/USD
    -------------------------------------------








  • 7.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 13:22
    Chris,

    My guess on the SAS side is that it doesn't have a sufficiently large line buffer.  Look in the log window and compare the LRECL= value to the reported maximum record length.  LRECL= must be larger.  If not, increase by adding LRECL= XXXX to the infile statement.  With 300+ columns, my guess is you need ca LRECL=3000 unless each column is much less than 10 characters each.

    -------------------------------------------
    Philip Dixon
    Iowa State Univ
    -------------------------------------------





  • 8.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 13:57
    In SAS with the .csv file, I would try including these options in the INFILE statement:
      LRECL= (as mentioned by Philip) DSD MISSOVER

    -------------------------------------------
    Mark Martin
    Siemens Healthcare Diagnostics
    -------------------------------------------








  • 9.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 16:07
    Although SPSS is not open source for  the whole package, The I/O DLLs are free if you want to roll your own.

    If you send me the files, I'll see if I can read them into SPSS.
    How many rows and columns do you expect?
    Are the string variables in quotes?




    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 10.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 17:44
    I tested SPSS and excel with data generated this way.
    *create file with 400 variables to see if SPSS can read a 400 variable CSV file.
    new file.
    input program.
       vector x (400,f3).
       loop id = 1 to 3.
          loop #p = 1 to 400.
             compute x(#p) = rnd(rv.normal(50,10)).
          end loop.
          end case.
       end loop.
       end file.
    end input program.
    SAVE TRANSLATE OUTFILE='C:\project\test400.csv'
      /TYPE=CSV
      /ENCODING='Locale'
      /MAP
      /REPLACE
      /FIELDNAMES
      /CELLS=VALUES.

    I was able to read in the file with excel.
    Using theGUI <file> <read text data>
    I was able to read the file into SPSS.

    See if you can work with the attached files.
    the second one was created with
            compute x(#p) = rnd(rv.normal(50000000,10)).

    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 11.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 20:13

    Hi , I read files that large into sAs all the time. What method are you using to read it in? I might ask on some of the SAs users forums. I would think R could read in a file that large as well. I know I have read some pretty large ones into R as well. I know SAS can easily handle 300 variables and millions of records on the right platform so I agree with Jay that it might be some other problem.


    -------------------------------------------
    Laura Kapitula
    Assistant Professor
    Grand Valley State University
    -------------------------------------------





  • 12.  RE:Excel files with more than 300 columns

    Posted 08-13-2013 21:05

    Chris,
    Like some of the others that have responded to your question, I also have not yet experienced a problem due to too many columns (in R).

    There is one instance when I encountered a problem with a messy large file. The problem was that the text file included text entries with commas and some special characters. I would get an error message that the rows did not have the right number of entries (as R could not parse the entries appropriately). 

    It would be helpful to see the error messages that you get (in R or SAS).

    -------------------------------------------
    Moni Neradilek
    Statistical Consultant
    The Mountain-Whisper Light Statistics
    -------------------------------------------








  • 13.  RE:Excel files with more than 300 columns

    Posted 08-16-2013 14:42

    There are a couple of tricks for getting R to read in large CSV files, such as specifying the column types so R doesn't have to figure this out itself.

    However, a better idea is to look at the R packages for directly reading Microsoft Excel files, in particular the "xlsx" package.  (Also the "read.xls" in my own "gdata" package.)

    -Greg


    -------------------------------------------
    Gregory Warnes
    Owner and Chief Scientist
    Gregory R. Warnes Consulting
    -------------------------------------------