On a recent project I happen to stumble upon an unique problem: splitting up a large flat text file (in my case large CSV) into equal parts so the processing could be more manageable.
So the problem: Split up a CSV file with 100,000 rows of data
After going through a few search engines, I found that the problem was not as easy as I first thought.
There was one PHP class that looked promising initially: Link here
But after taking a look more closely.
function run(){
$i=0;
$j=1;
$date = date("m-d-y");
unset($buffer);
$handle = @fopen ($this->Getsource(), "r");
while (!feof ($handle)) {
$buffer .= @fgets($handle, 4096);
$i++;
if ($i >= $split) {
$fname = $this->Getpath()."part.$date.$j.txt";
if (!$fhandle = @fopen($fname, 'w')) {
print "Cannot open file ($fname)";
exit;
}
if (!@fwrite($fhandle, $buffer)) {
print "Cannot write to file ($fname)";
exit;
}
fclose($fhandle);
$j++;
unset($buffer,$i);
}
}
fclose ($handle);
}
The script does not work simply because of:
fgets($handle, 4096);
The second argument of fgets function dictated how the file was split up. Reading a fixed bytes will result in splitting files not by line but by size which was not desirable.
My Solution:
Although the above script did not work, I was inspired by its processing logic of iterating through the file line by line and add the lines read to segmented files. So what I came up with was below:
/**********************************************************************
* Author: Chang Xiao (xiaochangfeng@gmail.com)
* Web...: http://www.chang2chang.com
* Name..: File Splitter
* Desc..: A simple file splitter to split csv (or text files)
* Date..: 6/2/2010
* PHP Version: 5.x
*/
/*
* Usage Example
*
*
* // 1. Input and Output files in the current directory
*
* $file = new split_file();
* $result = $file->file_split('randomdata.csv', 'split', 45);
*
*
* // 2. Input file in a directory called source, output files into
* // data directory
*
* $file = new split_file('data');
* $result = $file->file_split('source/randomdata.csv', 'split', 45);
* echo $result . ' number of files split';
*
*/
class split_file {
/*
* @string
* Full path or relative path to where the output file will be created
*/
private $dest_path;
/*
* Constructor
*/
public function __construct($dest_path = '') {
if($dest_path == '') {
$this->log_path = '.';
} else {
$this->log_path = $dest_path;
}
}
/*
****************************************************************************
* Split the input (text) file into smaller files by number of lines
*
* @params
* (string) source_file | File path of the source text file
* (string) output_prefix | File prefix for the output files
* (int) split_count | Number of lines per split file
*
* @returns
* (string) file_string | The original parsed text file string
* if number of lines was less than the
* split count
*
* (array) return_files | Array containing the file path of
* | splitted files.
*/
public function file_split($source_file, $output_prefix, $split_count) {
if($file_string = file_get_contents($source_file)) {
// convert line breaks into
$_file_string = nl2br($file_string);
$total_line_count = substr_count($_file_string, "<br />");
// if not enough lines to split, we just return the csv string back
if($total_line_count <= $split_count) {
return $file_string;
} else {
$return_files = array();
// turn the huge file into a 1D array
$data = explode("<br />", $_file_string);
// file name append increment counter
$x = 0;
// split increment counter
$y = 0;
// walk through the new data array
for($i = 0; $i< count($data); $i++) {
$buffer .= $data[$i];
$y++;
if($y >= $split_count) {
$x++;
$buffer = trim($buffer);
$this->_write($buffer, $output_prefix . $x . '.csv');
$return_files[] = $this->log_path . '/' . $output_prefix . $x . '.csv';
// start over
unset($y, $buffer);
}
// when we reach the end
if(count($data) - $i == 1) {
$x++;
$buffer = trim($buffer);
$this->_write($buffer, $output_prefix . $x . '.csv');
$return_files[] = $this->log_path . '/' . $output_prefix . $x . '.csv';
}
}
return $return_files;
}
}
}
/***************************************************************
* Internal function that writes lines into files (appending)
*
* @params
* (string) message | content to be written
* (string) file_name | just the file name of the file to be written
*
* @returns
* TRUE | write successful
* FALSE | write unsuccessful
*/
private function _write($message, $file_name) {
if(!is_dir($this->log_path)) {
mkdir($this->log_path, 777);
}
$_file_name = $this->log_path . '/' . $file_name;
if($handle = fopen($_file_name, 'a')) {
$re = fwrite($handle, $message);
$re2 = fclose($handle);
if ( $re != false && $re2 != false ) return true;
}
return false;
}
}
You can download the class file here split_file.class.php