Regular Expressions External Tool

Download

regextool.zip (239KB), last update: 02.11.2011

Description

This is a ready to use configurable external tool. This tool uses the boost regular expression engine to parse and format the input data. The configuration is done in the external tool’s ini file. This enables to quickly extend the application to import new formats.

The tool has too kind of parsing method. A row & field matching method and a search & replace method. You can use one or both of them, the row/field method will be executed first, the search/replace method as second step.

With the row & field matching method you define a search pattern that identify a row of data, then you define a separate search pattern for every field that will be applied to every row match to extract the field data. The matched fields are written to the output with the given format and flags. You use this method to parse xml files like the Open Financial Exchange, or txt files with data on multiple lines like the Quicken Interchange Format.

The search & replace method work like a simple search and replace. A pattern is searched and replaced with the given format and flags, the result is then written to the output. You use this method to parse simply txt files, like csv or tsv files. This is the preferred method if you simply have to prepend the header line, or change the format of the values.

If an error occurs, in the clipboard or in the given output files an error message starting with the text "error:" will be written and the value 1 is returned to the launching application.

 

Through the following parameters you can configure the tool:

 

Base parameters:

useclipboard 1 If set to "true" the filter output the result to the clipboard, otherwise to the given output file.
filter_codepage 0/1

The code page used to parse the input data. You can insert the code page number or one of the followings strings ACP, MACCP, OEMCP, SYMBOL, THREAD_ACP, UTF-7, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF32LE. If empty or not set the tool look if a BOM (byte order mask) is present in the file, or try to guess the code page in the following order: UTF16LE, UTF16BE, UTF8, ANSI.

The result is written to the output file in UTF-8, or to the clipboard in UTF-16LE.

filter_header 0/1

Define the header to prepend in the result data. This is equivalent to the column header. It is possible to use escape sequences, like '\t' for tabulator. Example: 'Date\tDoc\tDescription\tIncome\tExpenses'

filter_reverse_order 0/1 Output the transaction in reverse order.
filter_archive_path 0/1 If a zip or tgz archive containing more than one file is opened, it is possible through this parameter to define the name of the file in the archive to be readen. This parameter can contain perl regular expression. Only the first file that match the parameter will be imported.
debug 0/1 If set to "true", the tool write to the console the results of every regular expressions.

 

Parameters for the row & field matching method:

filter_row_match 0/1 Search pattern that identify a row (transaction, account, ...). 1 globally
filter_row_flags 0/1 Flags used for the filter_row_match.
filter_field_match 0+ Search pattern that identify a field in a row (Date, description, amount, ...). This pattern is applied to the result of the filter_row_match pattern. 1 of each for every field
filter_field_format 0+ Pattern for the formatting of the output of the matching field.
filter_field_flags 0+ Flags used for the filter_field_match.

Note: You need the same number of filter_field_match, filter_field_format, filter_field_flags parameters.


Parameters for the search & replace method:

filter_match 0+ Search pattern. 1 of each for every rule, more rules are possible
filter_format 0+ Format pattern for the output of the replaced text.
filter_flags 0+ Flags used for the filter_field_match.

Note: in the ini file you need the same number of filter_match, filter_format, filter_flags parameters. With the flag "format_no_copy" the text that is not part of the match will not be copied to the output, this is useful in case you want to delete part of the content.

The syntax for the match, format and flags parameters follow the syntax of the boost regular expression engine (like the syntax of pearl regular expression). The description, some examples and tutorials can be found on the following page: Boost.RegEx.

 

Parameters for line sorting:

The followings parameters are used to sort the lines, it is possible through regular expressions to identify the part of text used for sorting and the order of the sort.

filter_sort_match 0/1 Search pattern. 1 of each for every rule, more rules are possible
filter_sort_format 0/1 Format pattern for the output of the replaced text.
filter_sort_flags 0/1 Flags used for the sorting match
filter_sort_order 0/1 Order for sorting "ascending" or "descending"

 

Optional command line parameters

-debug Overwrite the debug setting in the ini file to true, this is used only for debugging.
-useoutfile Overwrite the useclipboard setting in the ini file to false, this is used only for debugging.
-version Print the version of the file to the console.

 

Regular expression syntax

The syntax for the match, format and flags parameters follow the syntax of the boost regular expression engine (like the syntax of pearl regular expression). 
The documentation of Boost Regular Expression is found at the following page:  Boost.RegEx.

If you  are new to regular expressions you can start from the following tutorial pages:

Perl regular expressions
Perl regular expressions tutorials
Short tutorial Using regular expressions from the University of Virginia

 

Requirements

Banana Accounting 5.0.7 or above.

Installation

Download the file regex_filter.zip attached to this page. Unpack and copy the content to "{Program files}\Banana50\ExternalTools" folder. ( {Program files} is the path where the programs are installed, usually "C:\Program files") . Start Banana Accounting 5.0. Open an account and go to the menu "Account1" - "Import to the accounting", you should now see the pre-defined formats in the list.

If you want to define a new format to import, copy an existing *.ini file and modify the tool's parameters.

 

Pre-defined formats

Version regextool: 1.0.9.0 (02.11.2011)

Ini file Version Last Update Description
cbi-import 1.0 23.11.2010 CBI (TXT)
china-construction-bank-csv.ini 1.1 16.12.2008 China Construction Bank (CSV)
cs-import.ini 1.1 04.06.2009 Credit Suisse (CSV)
hypovereinsbank-import.ini 1.4 24.03.2009 Hypovereinsbank (CSV)
lukb-import.ini 1.2 24.07.2009 Luzerner Kantonalbank (CSV)
mijnpostbank-import.ini 1.1 10.11.2010 Mijnpostbank (CSV)
ofc-import.ini 1.1 16.12.2008 Open financial connectivity (OFC)
ofx-import.ini 1.1 16.12.2008 Open Financial Exchange (OFX)
v11-import.ini 1.2 16.12.2010 Postfinance ESR V11 (V11)
pf-import.ini 1.2 02.11.2011 Postfinace xml account card (XML,TAR.GZ)
pf-csv-import.ini 1.3 04.08.2011 Postfinace csv account card (CSV)
qif-import.ini 1.2 16.12.2008 Quicken Interchange Format (QIF)
rabobank-import.ini 1.0 11.11.2010 Rabobank (CSV)
raif-import.init 1.1 15.11.2010 Raiffeinsen Bank (CSV)
sgkb-import.ini 1.2 24.07.2009 St Gallen Kantonalbank (CSV)
tkb-import.ini 1.2 24.07.2009 Thurgauer Kantonalbank (CSV,ZIP)
ubs-import.ini 1.13 28.06.2011 UBS Bank (CSV)
zkb-import.ini 1.3 30.03.2010 Zürcher Kantonalbank (CSV)

 

Changelog

2011-11-02
Updated regextool to correct an error with some tgz files
Updated postfinance xml to import account cost too

2011-08-04
Updated postfinance csv for quoted texts with lines breaks

2011-06-28
Updated ubs-import.ini for new field exchange rate

2011-03-08
Updated ubs-import.ini for charge backs with minus

2010-06-16
Version 1.0.8.0
Updated zip library
Updated ubs-import.ini for new csv format with details and totals

2010-03-30
Updated zkb-import.ini for new csv format

2010-02-18
Improved decoding of csv files inf utf-8 format
Updated zkb-import.ini for new csv format

2009-11-17
zuercherkantonalbank.ini version 1.1
Updated zkb-import.ini for new csv format

2009-08-24
ubs-import.ini version 1.7

2009-08-24
ubs-import.ini version 1.6

2009-06-04
cs-import.ini version 1.1
Updated cs-import.ini for new Credit Suisse csv format

2009-04-06
Version 1.0.6.0
Added parameters filter-sort-xxx for sorting

2009-01-14
With Postfinance ESR V11 the cedit date is new imported as value's date instead as transaction's date.
As transaction's date is now imported the processing date.

2009-01-14
Version 1.0.4.0
New import for Credit Suisse csv.
New parameter filter_reverse_order to output transactions in reverse order.

2008-08-07
Version 1.0.3.0
New UBS format included in ubs-import.ini.

2008-08-07
Version 1.0.3.0
It is now possible to import data directly from zip or tgz archives. See the description of parameter filter_archive_path.

2008-03-26
Added import for OFC (Open financial connectivity).

2008-03-06
Version: 1.0.2.0
In case of erorr the text "error: {error description}" will be outputed to the clipboard or output file, this enable to report the error condition to the application

2008-03-04
Version: 1.0.1.0
The filter for the Quicken Interchange Format can now import dates and amounts in European format too, before it was only in US format.

2008-02-26
Version: 1.0.0.0
First release

 

Examples

CSV file without header

In this example the value are separated by a ';' and the amounts are idetified by a sign if a credit or debit.

20080516;Telephone;-16.40;
20080516;Salary;4000.00;
'Filter parameters
useclipboard=true
filter_codepage=
filter_header=Date\tDescription\tIncome\tExpenses\t
' Parse the line of text
filter_match=^([^;]*);([^;]*);(-)?([^;]*);?$
filter_format=$1\t$2\t(?3:$4)\t(?3$4)\r\n
filter_flags=format_no_copy|match_not_dot_newline|format_all

- (-)?([^;]*) : is the pattern that identify an amount
- (?3:$4) : if the sign '-' is not found (3th group of the match pattern) output the group 4
- (?3$4) : if the sign '-' is found (3th group of the match pattern) output the group 4

V11 file

In this example the value are found at a fix position, and the date has not a valid format. If the transaction is a credit or debit is given by the 3th char, '1 or 2'=credit and '5'=debit. We need more than one step to format the file.

01201051010900000000000070100800003659100000110001546  020007061107061207061300001004900000000000175                          
01201051010900000000000070102300003663200000230001546  020007061307061407061500001002600000000000175                          
01201051010900000000000070102400003664100000110001546  020007061307061407061500001002500000000000175                          
01201051010900000000000070105200003667800000110001546  030007060907061107061200001000600000000000175                          
01201051010900000000000070105305003668500000110001546  030007060907061107061200001006500000000000175                          
'Filter parameters
filter_codepage=
filter_header=Date\tDescription\tIncome\tExpenses\t
'Parse a line in different groups
filter_match=^..((1|2)|(5)).{9}(.{27})(.{10}).{10}.{6}.{6}(.{6}).{9}.{1}.{9}(.{4}) *$
filter_format=D#$6#\tS#$4#\t(?3A#$5#)\t(?2A#$5#)\r\n
filter_flags=format_no_copy|match_not_dot_newline|format_all
filter_match=D#(.{2})(.{2})(.{2})#
filter_format=20$1$2$3
filter_flags=
filter_match=S#0*([^#]*?)#
filter_format=$1
filter_flags=
filter_match=A#0*([^#]*?)(.{2})#
filter_format=$1\.$2
filter_flags=

Quicken Interchange Format

In this example the data of a transaction are on multiple line. In this case the row&filed matching method is the best one.

!Type:Bank
D08/28/2000
T-8.15
N
PCHECKCARD SUPERMARKET
^
D08/28/2000
T-8.25
N
PCHECKCARD PUNJAB RESTAURANT
^
D08/28/2000
T-17.17
N
PCHECKCARD SUPERMARKET 
... external tool basis configuration is omitted
'Filter parameters
useclipboard=true
filter_codepage=
filter_header=Date\tDoc\tDescription\tIncome\tExpenses
'Match a transaction, every transaction is separated by the char '^'
filter_row_match=(\A|^\^)[^\^]*(^\^|\Z)
filter_row_flags=
'Date
filter_field_match=^D(.{2})/(.{2})/(.{4})$
filter_field_format=$3$1$2
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Doc number
filter_field_match=^N(.*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Payee
filter_field_match=^P(.*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
'Memo
filter_field_match=^M(.*)$
filter_field_format=$1 
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Debit
filter_field_match=^T([^-].*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Credit
filter_field_match=^T-(.*)$
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\r\n
filter_field_flags=format_no_copy

Open finalcial Exchange (OFX)

This is an example for parsing xml file, without the need of xsl. Even if this is not the best solution, it is possible to get the desired data.

...
<STMTTRN>
<TRNTYPE>
CREDIT
</TRNTYPE>
<DTPOSTED>
20070315
</DTPOSTED>
<DTUSER>
20070315
</DTUSER>
<TRNAMT>
200.00
</TRNAMT>
<FITID>
980315001
</FITID>
<NAME>
DEPOSIT
</NAME>
<MEMO>
Automatic deposit & loan
</MEMO>                                                
</STMTTRN>
...

'Filter parameters
filter_codepage=
filter_header=Date\tDoc\tDescription\tIncome\tExpenses
filter_row_match=<STMTTRN>.*?</STMTTRN>
filter_row_flags=
'Date
filter_field_match=<DTPOSTED>[\r\n ]*([^ <\r\n]+)[\r\n ]*</DTPOSTED>
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Doc number
filter_field_match=<REFNUM>[\r\n ]*([^ <\r\n]+)[\r\n ]*</REFNUM>
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Description 
filter_field_match=<NAME>[\r\n ]*([^<\r\n]+?)[\r\n ]*</NAME>
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Credit
filter_field_match=<TRNAMT>[\r\n ]*([^ \-<\r\n]+)[\r\n ]*</TRNAMT>
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline
filter_field_match=\A
filter_field_format=\t
filter_field_flags=format_no_copy
'Debit
filter_field_match=<TRNAMT>[\r\n ]*-([^ \-<\r\n]+)[\r\n ]*</TRNAMT>
filter_field_format=$1
filter_field_flags=format_no_copy|match_not_dot_newline 

How to change from UPPER CASE (ABCD EFGH) to Title Case (Abcd Efgh)

'Switch to title case
filter_match=([^[:alnum:]]{1})([[:upper:]]{1})([[:upper:]]{1,})(?=[^[:alnum:]]{1})
filter_format=$1$2\L$3\E
filter_flags=match_not_dot_newline|format_all

How to sort lines with ascending dates

' Sort lines
' The first 8 characters contain the date: 20090321 ...
filter_sort_match=^(.{8})
filter_sort_format=$1
filter_sort_flags=
filter_sort_order=ascending

 

Legal © 1998-2013 Banana.ch SA (Switzerland)