Reading BUFR using libECBufr with Fortran App in mind

2014-09-14 TOYODA Eizi

http://toyoda-eizi.net/2014/0912hybufr/

Environment Canada's BUFR decoder, called libECBufr, is neat and really stable software which I would recommend. By the word "stable" honestly I mean it can decode some JMA BUFR's which cannot be done by other decoder. This note is to show a quick-hack (I don't claim best) way to utilize it in Fortran-based computing application, in order to demonstrate that is possible.

Preparation

Source and binary packages of libECBufr is available at https://launchpad.net/libecbufr. If you are using Debian or Ubuntu, you can simply download *.deb binary package and install it. Otherwise source code is officially serviced via bzr tool:

$ bzr branch lp:libecbufr

for those who are not familliar with Debian tools, I have prepared tarball that can be used instead. Env Canada's guides to install:

Decoding

Env Canada gives quick usage guide. Recommended option to generate computer-readable output is "-dump". Assuming input is input-bufr.bin, the commandline would be naively:

$ BUFR_TABLES=/usr/share/libecbufr bufr_decoder -inbufr input-bufr.bin -dump -output dump.txt

But sometimes that fails. Error can be detected by presence of file "DEBUG.decoder". The file looks like following:

Descriptor 1210 ??
Error: unknown descriptor 1210
Descriptor 13209 ??
Error: unknown descriptor 13209
Error: Template definition contains error(s)
Error: Unable to create Template
Error: can't decode messages

Note that there are messsages "unknown descriptor". That indicates there are some missing entries in the BUFR Table B, stored in the file /usr/share/libecbufr/table_b_bufr by default.

Fixing tables

Each BUFR message contains a sequence of six-digit "descriptors", followed by actual data bits. Each descriptor is could be explained as variable number: the element can be numeric value with units, ASCII string (called its obsolete name CCITT IA5), or code/flag reference. Decoder reads the Table B to find name, units, number of bits, and reference value, to proceed decoding actual data. Thus the decoder has to stop at an unknown descriptor because the number of bits of accompanied data cannot be known a priori.

In this case the sequence of descriptors is as follows:

008021   TIME SIGNIFICANCE                     CODE TABLE     
004001   YEAR                                  YEAR           
004002   MONTH                                 MONTH          
004003   DAY                                   DAY            
004004   HOUR                                  HOUR           
004005   MINUTE                                MINUTES        
001210   RIVER IDENTIFIER                      NUMERIC        
005001   LATITUDE(HIGH ACCURACY)               DEGREE         
006001   LONGITUDE (HIGH ACCURACY)             DEGREE         
013209   RUNOFF INDEX                          NUMERIC        

The missing table entry may be recent addition to the WMO Standard Table B, which is usually updated twice a year and posted at WMO website. But in case the lowest three digits of the descriptor is 192 or more, that is local descriptors which can be defined independently by the originating center (in this case JMA). There should be some documentation or please consult the data provider.

Note: If decoder says a descriptor whose top digit is '3' (i.e. 3xxxxx), that is sequence descriptor to be stored in Table D (file table_d_bufr for libECBufr). A sequence descriptor is a shorthand for several (or even dozens of) other descriptors, and is used to save the size of message.

Anyhow you will have revised Tables B and D. Then the commandline would be:

$ bufr_decoder -ltableb ./table_b_bufr -ltabled ./table_d_bufr -inbufr input-bufr.bin -dump -output dump.txt

Reading Dump File

The dump file looks like following:

BUFR_EDITION=4                ---+
BUFR_MASTER_TABLE=0              |
ORIG_CENTER=34                   |
ORIG_SUB_CENTER=0                +--- one per each BUFR message
...                              |
COMPRESSED=1                  ---+
DATASUBSET 1 : 13 codes       ---+
008021 16                        |
301011                           |
004001 2014                      |
004002 7                         |
004003 10                        |
301012                           +--- repeated subsets
004004 1                         |
004005 0                         |
001210 81001013                  |
301021                           |
005001 45.00000                  |
006001 141.68750                 |
013209 3                      ---+

DATASUBSET 2 : 13 codes
008021 16
...

The sample code readdump.f90 is an example Fortran program to read this format. It reads filename from the standard input, opens the dump file and read, and then prints the decode in CSV:

$ echo dump.txt | ./readdump | more

It can be modified as appropriate to be incorporated into the computing application.

Limitations

The sample code readdump.f90 may be used for dump of other BUFR. Although it is intended to be generic to some extent, following limitations remains:

Size limitation
The number of subset and the number of data element in each subset are limited to 8192 and 256 respectively. If the size really matters, you can change the PARAMETER and recompile.
All subset must have same structure
That is because the code decides the structure by the first subset. The code would be much more complex otherwise. Uncompressed BUFR message with delayed replication (i.e. variable repetition by descriptor with top digit '1') fall on to the problem.
String not supported well
Only first eight letters are retained for each character-type element. Memory management would be needed for proper handling.
Operators not supported at all
Special descriptors with top digit '2' is called operators, and various metadata is described by this function. I don't know it well.

[ICO]NameLast modifiedSizeDescription

[DIR]Parent Directory  -
[TXT]GPLv3.txt14-Sep-2014 09:53 34KLesser GNU License
[TXT]Makefile13-Sep-2014 18:49 1.0Kbuild instruction
[TXT]Note.html14-Sep-2014 09:40 7.1K
[   ]Z__C_RJTD_20140710010000_MET_SEQ_RAwstd_Proi_ANAL_bufr.bin12-Sep-2014 13:16 25KAnalysis data sample
[   ]Z__C_RJTD_20140710010000_MET_SEQ_RAwstd_Proi_FH01-06_bufr4.bin12-Sep-2014 13:16 36KForecast data sample
[   ]anldump.csv17-Sep-2014 05:29 466Kresult in CSV
[TXT]anldump.txt14-Sep-2014 07:37 473Kdump by libECBufr
[   ]fcsdump.csv17-Sep-2014 05:29 780Kresult in CSV
[TXT]fcsdump.txt14-Sep-2014 07:37 775Kdump by libECBufr
[TXT]readdump.f9014-Sep-2014 08:33 5.6Kconverter
[TXT]table_b_bufr13-Sep-2014 18:46 143Kmodified table B
[   ]table_d_bufr12-Sep-2014 13:54 75K(un)modified Table D

Apache/2.2.3 (CentOS) Server at toyoda-eizi.net Port 80