Converting delimited text to Excel

Description Google Results

Non-technical people need to be able to work with data. They usually end up reaching for Excel or Access because we live in a malevolent Universe.

Fortunately for the Perl kids there are a couple excellent modules already done for you by our friends John McNamara (Spreadsheet::WriteExcel) and Kawai Takanori (Spreadsheet::ParseExcel). Here is an example of how you can turn Excel into delimited plain text: converting Excel to text.

Below is a very useful and fairly generic subroutine that can take all kinds of delimited files and turn them into straightforward Excel files.

Code
sub text_to_excel {
# %args should look something like...
# ( delimiter => "\t",
# recordsep => "\n",
# file => "/path/to/file.txt"
# name => "Sheet Title" )
# the only required args are delimiter and file

# we require instead of use to save on if we never end up using it in
# a larger script or CGI, but use statements at the top of the script
# are clearer for other programmers to follow.
require Spreadsheet::WriteExcel;
require IO::Scalar;

my %args = @_;
my ( $delimiter, $recordsep, $file, $name ) =
@args{qw( delimiter recordsep file name )};

$delimiter and $file or
die "Must provide at least delimiter and file as args to" .
"delimited_text_to_excel().";

-e $file or
die "There is no file: $file\n";

open F, "<", $file or croak("Can't open $file: $!");
$/ = $recordsep || "\n";
my @data = <F>;
close F;
$/ = "\n";

my $xls_str;
tie *XLS, 'IO::Scalar', \$xls_str;

my $workbook = Spreadsheet::WriteExcel->new(\*XLS);

my $worksheet = $workbook->addworksheet($name||'Page 1');

for ( my $row = 0; $row < @data; $row++ ) {

chomp( my @line = split /$delimiter/, $data[$row] );

for ( my $col = 0; $col < @line; $col++ ) {
$worksheet->write_string($row, $col, $line[$col] || "");
}
}
$workbook->close();
return $xls_str;
}
Usage
use MIME::Lite;  # we want to mail our excel sheet

my $file = '/data/profit_forcast';
my $name = '2006 Profit Forcast';
my $xls_data = text_to_excel( file => $file,
delimiter => "\t",
name => $name );

# we've done all the work. $xls_data IS the excel file in a raw
# format. we could do anything with it now, including writing it to a
# file, but let's send it via email.

my $msg = MIME::Lite->new(From => 'traitor@sedition.com',
To => 'tuna@fish.net',
Cc => 'traitor@sedition.com',
Subject => $name,
Type => 'multipart/mixed')
or die "PROBLEM opening MIME object: $!";

$msg->attach(Type => 'application/vnd.ms-excel',
Disposition => 'attachment',
Data => $xls_data,
Filename => $name . '.xls')
or die "PROBLEM attaching Excel file: $!";

$msg->send() or die "PROBLEM sending MIME mail: $!";

print "Sent $name!\n";

Discussion

Anyone who’s dealt with delimited files before knows that this approach is missing a way to balance delimiters. Eg: If your field delimiter is a tab and your record delimiter is a newline and one of the text fields has a tab or a return character in it, it will wreck the results.

To work with this, I often use the NULL character (”\0”) as a field delimiter and a double (”\0\0”) as a record delimiter. It will never appear in regular files so you don’t have to resort to Text::Balanced or something to ensure your data integrity.

If you will ever have empty fields that cause the field delimiter to double up, you’ll have to get crafty and do something like “\0”.‘_RS_’.”\0” for the record separator.

$xls_data = text_to_excel( file      => '/path/to/file.txt',
delimiter => "\0",
recordsep => "\0\0",
name => 'NULL delimited file' );

comment 0 comments:

Popular Posts

Linux Gazette