___________________________________________ | | | | | | | A | B | C | D | E | ___|_______|_______|_______|_______|_______| | | | | | | 1 | 1 | 2 | 3 | 4 | 5 | ___|_______|_______|_______|_______|_______| | | | | | | 2 | | | | | | ___|_______|_______|_______|_______|_______| | | | | | | 3 | | A | | B | | ___|_______|_______|_______|_______|_______| | | | | | | 4 | | | | | Z | ___|_______|_______|_______|_______|_______| | | | | | | 5 | 1,400 | | 250 | | | ___|_______|_______|_______|_______|_______|Then, the resulting CSV file will contain the following lines (records);
1,2,3,4,5 ,,,, ,A,,B, ,,,,Z "1,400",,250,,
Typically, the comma is used to separate each of the fields that, together, constitute a single record or line within the CSV file. This is not however a hard and fast rule and so this class allows the user to determine which character is used as the field separator and assumes the comma if none other is specified.
If a field contains the separator then it will be escaped. If the file should obey Excel's CSV formatting rules, then the field will be surrounded with speech marks whilst if it should obey UNIX conventions, each occurrence of the separator will be preceded by the backslash character.
If a field contains an end of line (EOL) character then it too will be escaped. If the file should obey Excel's CSV formatting rules then the field will again be surrounded by speech marks. On the other hand, if the file should follow UNIX conventions then a single backslash will precede the EOL character. There is no single applicable standard for UNIX and some appications replace the CR with \r and the LF with \n but this class will not do so.
If the field contains double quotes then that character will be escaped. It seems as though UNIX does not define a standard for this whilst Excel does. Should the CSV file have to obey Excel's formmating rules then the speech mark character will be escaped with a second set of speech marks. Finally, an enclosing set of speah marks will also surround the entire field. Thus, if the following line of text appeared in a cell - "Hello" he said - it would look like this when converted into a field within a CSV file - """Hello"" he said".
Finally, it is worth noting that talk of CSV 'standards' is really slightly missleading as there is no such thing. It may well be that the code in this class has to be modified to produce files to suit a specific application or requirement.
@author Mark B @version 1.00 9th April 2010 1.10 13th April 2010 - Added support forprocessing all Excel workbooks in a folder along with the ability to specify a field separator character. 2.00 14th April 2010 - Added support for embedded characters; the field separator, EOL and double quotes or speech marks. In addition, gave the client the ability to select how these are handled, either obeying Excel's or UNIX formatting conventions.
|
|