Regular Expressions External Tool
Download
regextool.zip (239KB), last update: 02.11.2011
Description
This is a ready to use configurable external tool. This tool uses the boost regular expression engine to parse and format the input data. The configuration is done in the external tool’s ini file. This enables to quickly extend the application to import new formats.
The tool has too kind of parsing method. A row & field matching method and a search & replace method. You can use one or both of them, the row/field method will be executed first, the search/replace method as second step.
With the row & field matching method you define a search pattern that identify a row of data, then you define a separate search pattern for every field that will be applied to every row match to extract the field data. The matched fields are written to the output with the given format and flags. You use this method to parse xml files like the Open Financial Exchange, or txt files with data on multiple lines like the Quicken Interchange Format.
The search & replace method work like a simple search and replace. A pattern is searched and replaced with the given format and flags, the result is then written to the output. You use this method to parse simply txt files, like csv or tsv files. This is the preferred method if you simply have to prepend the header line, or change the format of the values.
If an error occurs, in the clipboard or in the given output files an error message starting with the text "error:" will be written and the value 1 is returned to the launching application.
Through the following parameters you can configure the tool:
Base parameters:
| useclipboard | 1 | If set to "true" the filter output the result to the clipboard, otherwise to the given output file. |
| filter_codepage | 0/1 |
The code page used to parse the input data. You can insert the code page number or one of the followings strings ACP, MACCP, OEMCP, SYMBOL, THREAD_ACP, UTF-7, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF32LE. If empty or not set the tool look if a BOM (byte order mask) is present in the file, or try to guess the code page in the following order: UTF16LE, UTF16BE, UTF8, ANSI. The result is written to the output file in UTF-8, or to the clipboard in UTF-16LE. |
| filter_header | 0/1 |
Define the header to prepend in the result data. This is equivalent to the column header. It is possible to use escape sequences, like '\t' for tabulator. Example: 'Date\tDoc\tDescription\tIncome\tExpenses' |
| filter_reverse_order | 0/1 | Output the transaction in reverse order. |
| filter_archive_path | 0/1 | If a zip or tgz archive containing more than one file is opened, it is possible through this parameter to define the name of the file in the archive to be readen. This parameter can contain perl regular expression. Only the first file that match the parameter will be imported. |
| debug | 0/1 | If set to "true", the tool write to the console the results of every regular expressions. |
Parameters for the row & field matching method:
| filter_row_match | 0/1 | Search pattern that identify a row (transaction, account, ...). | 1 globally |
| filter_row_flags | 0/1 | Flags used for the filter_row_match. | |
| filter_field_match | 0+ | Search pattern that identify a field in a row (Date, description, amount, ...). This pattern is applied to the result of the filter_row_match pattern. | 1 of each for every field |
| filter_field_format | 0+ | Pattern for the formatting of the output of the matching field. | |
| filter_field_flags | 0+ | Flags used for the filter_field_match. |
Note: You need the same number of filter_field_match, filter_field_format, filter_field_flags parameters.
Parameters for the search & replace method:
| filter_match | 0+ | Search pattern. | 1 of each for every rule, more rules are possible |
| filter_format | 0+ | Format pattern for the output of the replaced text. | |
| filter_flags | 0+ | Flags used for the filter_field_match. |
Note: in the ini file you need the same number of filter_match, filter_format, filter_flags parameters. With the flag "format_no_copy" the text that is not part of the match will not be copied to the output, this is useful in case you want to delete part of the content.
The syntax for the match, format and flags parameters follow the syntax of the boost regular expression engine (like the syntax of pearl regular expression). The description, some examples and tutorials can be found on the following page: Boost.RegEx.
Parameters for line sorting:
The followings parameters are used to sort the lines, it is possible through regular expressions to identify the part of text used for sorting and the order of the sort.
| filter_sort_match | 0/1 | Search pattern. | 1 of each for every rule, more rules are possible |
| filter_sort_format | 0/1 | Format pattern for the output of the replaced text. | |
| filter_sort_flags | 0/1 | Flags used for the sorting match | |
| filter_sort_order | 0/1 | Order for sorting "ascending" or "descending" |
Optional command line parameters
| -debug | Overwrite the debug setting in the ini file to true, this is used only for debugging. |
| -useoutfile | Overwrite the useclipboard setting in the ini file to false, this is used only for debugging. |
| -version | Print the version of the file to the console. |
Regular expression syntax
The syntax for the match, format and flags parameters follow the syntax of the boost regular expression engine (like the syntax of pearl regular expression).
The documentation of Boost Regular Expression is found at the following page: Boost.RegEx.
If you are new to regular expressions you can start from the following tutorial pages:
Perl regular expressions
Perl regular expressions tutorials
Short tutorial Using regular expressions from the University of Virginia
Requirements
Banana Accounting 5.0.7 or above.
Installation
Download the file regex_filter.zip attached to this page. Unpack and copy the content to "{Program files}\Banana50\ExternalTools" folder. ( {Program files} is the path where the programs are installed, usually "C:\Program files") . Start Banana Accounting 5.0. Open an account and go to the menu "Account1" - "Import to the accounting", you should now see the pre-defined formats in the list.
If you want to define a new format to import, copy an existing *.ini file and modify the tool's parameters.
Pre-defined formats
Version regextool: 1.0.9.0 (02.11.2011)
| Ini file | Version | Last Update | Description |
| cbi-import | 1.0 | 23.11.2010 | CBI (TXT) |
| china-construction-bank-csv.ini | 1.1 | 16.12.2008 | China Construction Bank (CSV) |
| cs-import.ini | 1.1 | 04.06.2009 | Credit Suisse (CSV) |
| hypovereinsbank-import.ini | 1.4 | 24.03.2009 | Hypovereinsbank (CSV) |
| lukb-import.ini | 1.2 | 24.07.2009 | Luzerner Kantonalbank (CSV) |
| mijnpostbank-import.ini | 1.1 | 10.11.2010 | Mijnpostbank (CSV) |
| ofc-import.ini | 1.1 | 16.12.2008 | Open financial connectivity (OFC) |
| ofx-import.ini | 1.1 | 16.12.2008 | Open Financial Exchange (OFX) |
| v11-import.ini | 1.2 | 16.12.2010 | Postfinance ESR V11 (V11) |
| pf-import.ini | 1.2 | 02.11.2011 | Postfinace xml account card (XML,TAR.GZ) |
| pf-csv-import.ini | 1.3 | 04.08.2011 | Postfinace csv account card (CSV) |
| qif-import.ini | 1.2 | 16.12.2008 | Quicken Interchange Format (QIF) |
| rabobank-import.ini | 1.0 | 11.11.2010 | Rabobank (CSV) |
| raif-import.init | 1.1 | 15.11.2010 | Raiffeinsen Bank (CSV) |
| sgkb-import.ini | 1.2 | 24.07.2009 | St Gallen Kantonalbank (CSV) |
| tkb-import.ini | 1.2 | 24.07.2009 | Thurgauer Kantonalbank (CSV,ZIP) |
| ubs-import.ini | 1.13 | 28.06.2011 | UBS Bank (CSV) |
| zkb-import.ini | 1.3 | 30.03.2010 | Zürcher Kantonalbank (CSV) |
Changelog
2011-11-02
Updated regextool to correct an error with some tgz files
Updated postfinance xml to import account cost too
2011-08-04
Updated postfinance csv for quoted texts with lines breaks
2011-06-28
Updated ubs-import.ini for new field exchange rate
2011-03-08
Updated ubs-import.ini for charge backs with minus
2010-06-16
Version 1.0.8.0
Updated zip library
Updated ubs-import.ini for new csv format with details and totals
2010-03-30
Updated zkb-import.ini for new csv format
2010-02-18
Improved decoding of csv files inf utf-8 format
Updated zkb-import.ini for new csv format
2009-11-17
zuercherkantonalbank.ini version 1.1
Updated zkb-import.ini for new csv format
2009-08-24
ubs-import.ini version 1.7
2009-08-24
ubs-import.ini version 1.6
2009-06-04
cs-import.ini version 1.1
Updated cs-import.ini for new Credit Suisse csv format
2009-04-06
Version 1.0.6.0
Added parameters filter-sort-xxx for sorting
2009-01-14
With Postfinance ESR V11 the cedit date is new imported as value's date instead as transaction's date.
As transaction's date is now imported the processing date.
2009-01-14
Version 1.0.4.0
New import for Credit Suisse csv.
New parameter filter_reverse_order to output transactions in reverse order.
2008-08-07
Version 1.0.3.0
New UBS format included in ubs-import.ini.
2008-08-07
Version 1.0.3.0
It is now possible to import data directly from zip or tgz archives. See the description of parameter filter_archive_path.
2008-03-26
Added import for OFC (Open financial connectivity).
2008-03-06
Version: 1.0.2.0
In case of erorr the text "error: {error description}" will be outputed to the clipboard or output file, this enable to report the error condition to the application
2008-03-04
Version: 1.0.1.0
The filter for the Quicken Interchange Format can now import dates and amounts in European format too, before it was only in US format.
2008-02-26
Version: 1.0.0.0
First release
Examples
CSV file without header
In this example the value are separated by a ';' and the amounts are idetified by a sign if a credit or debit.
20080516;Telephone;-16.40; 20080516;Salary;4000.00;
'Filter parameters useclipboard=true filter_codepage= filter_header=Date\tDescription\tIncome\tExpenses\t ' Parse the line of text filter_match=^([^;]*);([^;]*);(-)?([^;]*);?$ filter_format=$1\t$2\t(?3:$4)\t(?3$4)\r\n filter_flags=format_no_copy|match_not_dot_newline|format_all
- (-)?([^;]*) : is the pattern that identify an amount
- (?3:$4) : if the sign '-' is not found (3th group of the match pattern) output the group 4
- (?3$4) : if the sign '-' is found (3th group of the match pattern) output the group 4
V11 file
In this example the value are found at a fix position, and the date has not a valid format. If the transaction is a credit or debit is given by the 3th char, '1 or 2'=credit and '5'=debit. We need more than one step to format the file.
01201051010900000000000070100800003659100000110001546 020007061107061207061300001004900000000000175 01201051010900000000000070102300003663200000230001546 020007061307061407061500001002600000000000175 01201051010900000000000070102400003664100000110001546 020007061307061407061500001002500000000000175 01201051010900000000000070105200003667800000110001546 030007060907061107061200001000600000000000175 01201051010900000000000070105305003668500000110001546 030007060907061107061200001006500000000000175
'Filter parameters
filter_codepage=
filter_header=Date\tDescription\tIncome\tExpenses\t
'Parse a line in different groups
filter_match=^..((1|2)|(5)).{9}(.{27})(.{10}).{10}.{6}.{6}(.{6}).{9}.{1}.{9}(.{4}) *$
filter_format=D#$6#\tS#$4#\t(?3A#$5#)\t(?2A#$5#)\r\n
filter_flags=format_no_copy|match_not_dot_newline|format_all
filter_match=D#(.{2})(.{2})(.{2})#
filter_format=20$1$2$3
filter_flags=
filter_match=S#0*([^#]*?)#
filter_format=$1
filter_flags=
filter_match=A#0*([^#]*?)(.{2})#
filter_format=$1\.$2
filter_flags=
Quicken Interchange Format
In this example the data of a transaction are on multiple line. In this case the row&filed matching method is the best one.
!Type:Bank D08/28/2000 T-8.15 N PCHECKCARD SUPERMARKET ^ D08/28/2000 T-8.25 N PCHECKCARD PUNJAB RESTAURANT ^ D08/28/2000 T-17.17 N PCHECKCARD SUPERMARKET
... external tool basis configuration is omitted
'Filter parameters
useclipboard=true
filter_codepage=
filter_header=Date\tDoc\tDescription\tIncome\tExpenses
'Match a transaction, every transaction is separated by the char '^'
filter_row_match=(\A|^\^)[^\^]*(^\^|\Z)
filter_row_flags=
'Date
filter_field_match=^D(.{2})/(.{2})/(.{4})$
filter_field_format=$3$1$2
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Doc number
filter_field_match=^N(.*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Payee
filter_field_match=^P(.*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
'Memo
filter_field_match=^M(.*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Debit
filter_field_match=^T([^-].*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Credit
filter_field_match=^T-(.*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\r\n
filter_field_flags=format_no_copy
Open finalcial Exchange (OFX)
This is an example for parsing xml file, without the need of xsl. Even if this is not the best solution, it is possible to get the desired data.
... <STMTTRN> <TRNTYPE> CREDIT </TRNTYPE> <DTPOSTED> 20070315 </DTPOSTED> <DTUSER> 20070315 </DTUSER> <TRNAMT> 200.00 </TRNAMT> <FITID> 980315001 </FITID> <NAME> DEPOSIT </NAME> <MEMO> Automatic deposit & loan </MEMO> </STMTTRN> ...
'Filter parameters filter_codepage= filter_header=Date\tDoc\tDescription\tIncome\tExpenses filter_row_match=<STMTTRN>.*?</STMTTRN> filter_row_flags= 'Date filter_field_match=<DTPOSTED>[\r\n ]*([^ <\r\n]+)[\r\n ]*</DTPOSTED> filter_field_format=$1 filter_field_flags=format_no_copy|match_not_dot_newline filter_field_match=\A filter_field_format=\t filter_field_flags=format_no_copy 'Doc number filter_field_match=<REFNUM>[\r\n ]*([^ <\r\n]+)[\r\n ]*</REFNUM> filter_field_format=$1 filter_field_flags=format_no_copy|match_not_dot_newline filter_field_match=\A filter_field_format=\t filter_field_flags=format_no_copy 'Description filter_field_match=<NAME>[\r\n ]*([^<\r\n]+?)[\r\n ]*</NAME> filter_field_format=$1 filter_field_flags=format_no_copy|match_not_dot_newline filter_field_match=\A filter_field_format=\t filter_field_flags=format_no_copy 'Credit filter_field_match=<TRNAMT>[\r\n ]*([^ \-<\r\n]+)[\r\n ]*</TRNAMT> filter_field_format=$1 filter_field_flags=format_no_copy|match_not_dot_newline filter_field_match=\A filter_field_format=\t filter_field_flags=format_no_copy 'Debit filter_field_match=<TRNAMT>[\r\n ]*-([^ \-<\r\n]+)[\r\n ]*</TRNAMT> filter_field_format=$1 filter_field_flags=format_no_copy|match_not_dot_newline
How to change from UPPER CASE (ABCD EFGH) to Title Case (Abcd Efgh)
'Switch to title case
filter_match=([^[:alnum:]]{1})([[:upper:]]{1})([[:upper:]]{1,})(?=[^[:alnum:]]{1})
filter_format=$1$2\L$3\E
filter_flags=match_not_dot_newline|format_all
How to sort lines with ascending dates
' Sort lines
' The first 8 characters contain the date: 20090321 ...
filter_sort_match=^(.{8})
filter_sort_format=$1
filter_sort_flags=
filter_sort_order=ascending

