On a recent project I happen to stumble upon an unique problem: splitting up a large flat text file (in my case large CSV) into equal parts so the processing could be more manageable.
So the problem: Split up a CSV file with 100,000 rows of data
After going through a few search engines, I found that the problem was not as easy as I first thought.
There was one PHP class that looked promising initially: Link here
But after taking a look more closely.
function run(){
$i=0;
$j=1;
$date = date("m-d-y");
unset($buffer);
$handle = @fopen ($this->Getsource(), "r");
while (!feof ($handle)) {
$buffer .= @fgets($handle, 4096);
$i++;
if ($i >= $split) {
$fname = $this->Getpath()."part.$date.$j.txt";
if (!$fhandle = @fopen($fname, 'w')) {
print "Cannot open file ($fname)";
exit;
}
if (!@fwrite($fhandle, $buffer)) {
print "Cannot write to file ($fname)";
exit;
}
fclose($fhandle);
$j++;
unset($buffer,$i);
}
}
fclose ($handle);
}
The script does not work simply because of:
fgets($handle, 4096);
The second argument of fgets function dictated how the file was split up. Reading a fixed bytes will result in splitting files not by line but by size which was not desirable.
My Solution:
Although the above script did not work, I was inspired by its processing logic of iterating through the file line by line and add the lines read to segmented files. So what I came up with was below:
/**********************************************************************
* Author: Chang Xiao (xiaochangfeng@gmail.com)
* Web...: http://www.chang2chang.com
* Name..: File Splitter
* Desc..: A simple file splitter to split csv (or text files)
* Date..: 6/2/2010
* PHP Version: 5.x
*/
/*
* Usage Example
*
*
* // 1. Input and Output files in the current directory
*
* $file = new split_file();
* $result = $file->file_split('randomdata.csv', 'split', 45);
*
*
* // 2. Input file in a directory called source, output files into
* // data directory
*
* $file = new split_file('data');
* $result = $file->file_split('source/randomdata.csv', 'split', 45);
* echo $result . ' number of files split';
*
*/
class split_file {
/*
* @string
* Full path or relative path to where the output file will be created
*/
private $dest_path;
/*
* Constructor
*/
public function __construct($dest_path = '') {
if($dest_path == '') {
$this->log_path = '.';
} else {
$this->log_path = $dest_path;
}
}
/*
****************************************************************************
* Split the input (text) file into smaller files by number of lines
*
* @params
* (string) source_file | File path of the source text file
* (string) output_prefix | File prefix for the output files
* (int) split_count | Number of lines per split file
*
* @returns
* (string) file_string | The original parsed text file string
* if number of lines was less than the
* split count
*
* (array) return_files | Array containing the file path of
* | splitted files.
*/
public function file_split($source_file, $output_prefix, $split_count) {
if($file_string = file_get_contents($source_file)) {
// convert line breaks into
$_file_string = nl2br($file_string);
$total_line_count = substr_count($_file_string, "<br />");
// if not enough lines to split, we just return the csv string back
if($total_line_count <= $split_count) {
return $file_string;
} else {
$return_files = array();
// turn the huge file into a 1D array
$data = explode("<br />", $_file_string);
// file name append increment counter
$x = 0;
// split increment counter
$y = 0;
// walk through the new data array
for($i = 0; $i< count($data); $i++) {
$buffer .= $data[$i];
$y++;
if($y >= $split_count) {
$x++;
$buffer = trim($buffer);
$this->_write($buffer, $output_prefix . $x . '.csv');
$return_files[] = $this->log_path . '/' . $output_prefix . $x . '.csv';
// start over
unset($y, $buffer);
}
// when we reach the end
if(count($data) - $i == 1) {
$x++;
$buffer = trim($buffer);
$this->_write($buffer, $output_prefix . $x . '.csv');
$return_files[] = $this->log_path . '/' . $output_prefix . $x . '.csv';
}
}
return $return_files;
}
}
}
/***************************************************************
* Internal function that writes lines into files (appending)
*
* @params
* (string) message | content to be written
* (string) file_name | just the file name of the file to be written
*
* @returns
* TRUE | write successful
* FALSE | write unsuccessful
*/
private function _write($message, $file_name) {
if(!is_dir($this->log_path)) {
mkdir($this->log_path, 777);
}
$_file_name = $this->log_path . '/' . $file_name;
if($handle = fopen($_file_name, 'a')) {
$re = fwrite($handle, $message);
$re2 = fclose($handle);
if ( $re != false && $re2 != false ) return true;
}
return false;
}
}
You can download the class file here split_file.class.php

Chang is a 20 some years old web technology enthusiast, he loves open source web application design and development. In his spare times, he loves to explore new technologies, web 2.0 and its business implications,