I spent some time coding on the weekend (which would be considered sad by many, but not by me) and built a simple push button GUI interface that allows the user to refine WOD queries by instrument and month. If I wasn't limited by the X interface, I'd add more SQL type functionality and allow for queries on any field (year, cruise ID, etc). More work in the future I guess. Here's a screen shot of the button bar that allows the user to choose instrument type and month of observation, the top row is for the month, the bottom row is for instrument type. The labels on the buttons switch to UPPERCASE when selected and back to lowercase when deselected. The query is only done after a geographic selection in the map window. I've shown an example of several hundred CTD casts collected in the months of July, August and September in the Northwest Passage in northern Canada, one of my favourite areas that I've ever worked in.
The CTD data is plotted in the main SVP Tool window above, with sound speed, temperature and salinity plotted versus depth (left to right, respectively).
More work? Of course. If I turn on the debug mode of the WOD query engine, I can see that it spends an awful lot of its time unzipping the raw source data for each WMO square that it visits during the query extraction. I'm still looking to speed this up if i can, right now the code fires off a system('gunzip -c file > tempfile'); whenever it hits a zipped WOD file. I'm curious if it would be faster to run through zlib instead? Kurt had also suggested converting the WOD files to a binary format so that's another option that would guarantee to bear some fruit but I really do like the elegance of sticking with the raw WOD files (easy to update, no need to reconvert with each update). More thinking needs to be done, obviously... Suggestions are welcome.
Tuesday, August 10, 2010
Wednesday, August 4, 2010
Downloading WOD
You can download the WOD files from the NODC website through an online query (http://www.nodc.noaa.gov/OC5/SELECT/dbsearch/dbsearch.html), you can also download the entire database if you look hard enough for the links (http://www.nodc.noaa.gov/OC5/WOD09/data09geo.html). I didn't feel like pointing and clicking myself to death so I wrote this script to download the entire thing. I also added a line in the script to compile a summary database of the CTD and XBT data for use in svp_tool.
#!/bin/bash
count=0
mkdir -p {OSD,MBT,CTD,XBT,PFL,MRB,DRB,APB,UOR,GLD}/OBS
mkdir -p {OSD,MBT,CTD,XBT,PFL,MRB,DRB,APB,UOR,GLD}/STD
echo "Downloading the WOD database"
for quadrant in 1 3 5 7
do
for latitude in 0 1 2 3 4 5 6 7 8
do
for longitude in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
do
let count++
for type in OSD MBT CTD XBT PFL MRB DRB APB UOR GLD
do
# Get the "observed" data
file=$type\O$quadrant$latitude$longitude.gz
wget http://data.nodc.noaa.gov/woa/WOD09/GEOGRAPHIC/$type/OBS/$file
mv $file $type/OBS
# Get the "standard level" data (observed is filtered and then
# interpolated to standard depth levels)
file=$type\S$quadrant$latitude$longitude.gz
wget http://data.nodc.noaa.gov/woa/WOD09/GEOGRAPHIC/$type/STD/$file
mv $file $type/STD
done
done
done
done
echo got total $count
# Now add a tmp directory for use by the WOD library for unzipping database files
# during extraction
mkdir tmp
# Now build database of CTD and XBT data
echo "Building a CTD/XBT summary database"
wod2db3 -out wod.db3 CTD/OBS/CTDO*gz XBT/OBS/XBTO*gz
echocount=0
mkdir -p {OSD,MBT,CTD,XBT,PFL,MRB,DRB,APB,UOR,GLD}/OBS
mkdir -p {OSD,MBT,CTD,XBT,PFL,MRB,DRB,APB,UOR,GLD}/STD
echo "Downloading the WOD database"
for quadrant in 1 3 5 7
do
for latitude in 0 1 2 3 4 5 6 7 8
do
for longitude in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
do
let count++
for type in OSD MBT CTD XBT PFL MRB DRB APB UOR GLD
do
# Get the "observed" data
file=$type\O$quadrant$latitude$longitude.gz
wget http://data.nodc.noaa.gov/woa/WOD09/GEOGRAPHIC/$type/OBS/$file
mv $file $type/OBS
# Get the "standard level" data (observed is filtered and then
# interpolated to standard depth levels)
file=$type\S$quadrant$latitude$longitude.gz
wget http://data.nodc.noaa.gov/woa/WOD09/GEOGRAPHIC/$type/STD/$file
mv $file $type/STD
done
done
done
done
echo got total $count
# Now add a tmp directory for use by the WOD library for unzipping database files
# during extraction
mkdir tmp
# Now build database of CTD and XBT data
echo "Building a CTD/XBT summary database"
wod2db3 -out wod.db3 CTD/OBS/CTDO*gz XBT/OBS/XBTO*gz
echo
echo "Add WOD_PATH="`pwd` "as an environment variable"
echo "Add WOD_DATABASE_NAME="`pwd`"/"wod.db3 "as an environment variable"
exit
Tuesday, August 3, 2010
sqlite3 and the World Ocean Database, continued
I got things sped up quite a bit by directly querying the database file I had created (see my last post) and by returning the query results as an array of watercolumn structures instead of writing them to disk and then reading them in again in my svp_tool.
Here's the bit of code that deals with building the query:
// Form the query: select records based on geo/time bounds and then sort in such a way that
// the hits return in WOD filename order to minimize jumping around between files
sprintf(sql_string,"SELECT * FROM tbl1 WHERE latitude > %.3f AND latitude < %.3f AND longitude > %.3f AND longitude < %.3f AND month in %s AND instrument in %s ORDER BY instrument, observed, quadrant, wmo_lat, wmo_lon, year, month, day;",
wod_reader->query_bounds.latitude_south, wod_reader->query_bounds.latitude_north,
wod_reader->query_bounds.longitude_west, wod_reader->query_bounds.longitude_east,month_list,instrument_list);
if (wod_reader->debug) printf("Doing SQL string %s\n",sql_string);
printf("Doing WOD query...patience\n");
// Do the query and deal with the results
rc = sqlite3_exec(wod_reader->db, sql_string, wod_query_callback, wod_reader, &zErrMsg);
if (rc) error("SQL error: %s",zErrMsg);
return(0);
}
The callback function (wod_query_callback) in the sqlite3_exec call deals with the query results one at a time. Here'swhat's going on in there:
static int wod_query_callback(void * reader_in, int argc, char **argv, char **azColName)
{
int i;
int iend;
int got_new_file;
WOD_Reader *wod_reader;
char * inname = NULL;
int file_offset, num_bytes;
int instrument, observed, quadrant, wmo_lat, wmo_lon;
char type_lowercase[4];
char type_uppercase[4];
memset(type_uppercase,0,4);
memset(type_lowercase,0,4);
wod_reader = (WOD_Reader *) reader_in;
got_new_file = 0;
// Parse the output record from the query
for(i=0; i
{
if (wod_reader->debug) printf("%s = %s\n", azColName[i], argv[i] ? argv[i] : "NULL");
if (!strcmp(azColName[i],"instrument"))
{
instrument = atoi(argv[i]);
if (instrument != wod_reader->instrument) got_new_file++;
}
else if (!strcmp(azColName[i],"observed"))
{
observed = atoi(argv[i]);
if (observed != wod_reader->observed) got_new_file++;
}
else if (!strcmp(azColName[i],"quadrant"))
{
quadrant = atoi(argv[i]);
if (quadrant != wod_reader->quadrant) got_new_file++;
}
else if (!strcmp(azColName[i],"wmo_lat"))
{
wmo_lat = atoi(argv[i]);
if (wmo_lat != wod_reader->wmo_lat) got_new_file++;
}
else if (!strcmp(azColName[i],"wmo_lon"))
{
wmo_lon = atoi(argv[i]);
if (wmo_lon != wod_reader->wmo_lon) got_new_file++;
}
else if (!strcmp(azColName[i],"file_offset"))
{
file_offset = atoi(argv[i]);
}
else if (!strcmp(azColName[i],"num_bytes"))
{
num_bytes = atoi(argv[i]);
}
}
if (got_new_file)
{
if (wod_reader->debug) printf("Need to load a new WOD file!!\n");
// Got a new file, so close the old one...
wod_close_file(wod_reader);
memcpy(type_uppercase,wod_data_types[instrument],3);
type_lowercase[0] = tolower(type_uppercase[0]);
type_lowercase[1] = tolower(type_uppercase[1]);
type_lowercase[2] = tolower(type_uppercase[2]);
// Build up the filename
inname = calloc(strlen(wod_reader->wod_path)+50,1);
memset(inname,0,strlen(wod_reader->wod_path)+50);
// this the type of path that we're trying to construct: ctd/OBS/CTDO7807.gz
sprintf(inname,"%s/%s/OBS/%sO%d%d%0.2d.gz",
wod_reader->wod_path,type_lowercase,type_uppercase,quadrant,wmo_lat,wmo_lon);
if (wod_open_file(wod_reader,inname))
{
sprintf(inname,"%s/%s/OBS/%sO%d%d%0.2d",wod_reader->wod_path,type_lowercase,type_uppercase,
quadrant,wmo_lat,wmo_lon);
if (wod_open_file(wod_reader,inname))
{
printf("Can't open file %s ... non-existant??\n",inname);
free(inname);
inname = NULL;
return(1);
}
}
if (inname) free(inname);
inname = NULL;
}
if (wod_reader->debug) printf("%s: fseeking to %d for %d bytes\n",wod_reader->fname, file_offset,num_bytes);
// Seek to the right spot in the file
fseek(wod_reader->fp,file_offset,SEEK_SET);
// Now read the profile at that location
if ((iend = oclread (wod_reader)) == -1)
{
wod_close_file(wod_reader);
return(1);
}
// Convert the cast to OMG SVP format, this updates the extracted_watercolumns array
wod2svp(wod_reader);
return 0;
}
Now I've just got to build a query building utility in the graphical end that lets the user select instrument, month and/or other fields. Geographic bounds are currently selected by lasso in the map viewer and the month defaults to the current month.
I'm also planning on speeding up the oclread function (was delivered with the free NODC code that came with the WOD), right now it's reading things a single field at a time and I think it can be sped up a bit by reading the entire cast into memory and then parsing it from there.
jb
Here's the bit of code that deals with building the query:
int wod_extract_profiles (WOD_Reader * wod_reader)
{
int rc;
char *zErrMsg = 0;
int i;
int offset;
char month_list[50];
char instrument_list[50];
char sql_string[500];
memset(month_list,0,50);
memset(instrument_list,0,50);
memset(sql_string,0,500);
// There must be easier ways to do this than the way I'm doing it
month_list[0]='(';
offset = 1;
for (i=1;i<=12;i++)
{
if (!wod_reader->query_bounds.month[i]) continue;
sprintf(&month_list[offset],"%d,",i);
offset = strlen(month_list);
}
month_list[offset-1]=')';
// There must be easier ways to do this than the way I'm doing it
instrument_list[0]='(';
offset = 1;
for (i=1;i<=WOD_NUM_TYPES;i++)
{
if (!wod_reader->query_bounds.instrument[i]) continue;
sprintf(&instrument_list[offset],"%d,",i);
offset = strlen(instrument_list);
}
instrument_list[offset-1]=')';
// Check some of the query bounds before we start
if (wod_reader->query_bounds.longitude_west > 180) wod_reader->query_bounds.longitude_west -= 360;
if (wod_reader->query_bounds.longitude_east > 180) wod_reader->query_bounds.longitude_east -= 360;
{
int rc;
char *zErrMsg = 0;
int i;
int offset;
char month_list[50];
char instrument_list[50];
char sql_string[500];
memset(month_list,0,50);
memset(instrument_list,0,50);
memset(sql_string,0,500);
// There must be easier ways to do this than the way I'm doing it
month_list[0]='(';
offset = 1;
for (i=1;i<=12;i++)
{
if (!wod_reader->query_bounds.month[i]) continue;
sprintf(&month_list[offset],"%d,",i);
offset = strlen(month_list);
}
month_list[offset-1]=')';
// There must be easier ways to do this than the way I'm doing it
instrument_list[0]='(';
offset = 1;
for (i=1;i<=WOD_NUM_TYPES;i++)
{
if (!wod_reader->query_bounds.instrument[i]) continue;
sprintf(&instrument_list[offset],"%d,",i);
offset = strlen(instrument_list);
}
instrument_list[offset-1]=')';
// Check some of the query bounds before we start
if (wod_reader->query_bounds.longitude_west > 180) wod_reader->query_bounds.longitude_west -= 360;
if (wod_reader->query_bounds.longitude_east > 180) wod_reader->query_bounds.longitude_east -= 360;
// the hits return in WOD filename order to minimize jumping around between files
sprintf(sql_string,"SELECT * FROM tbl1 WHERE latitude > %.3f AND latitude < %.3f AND longitude > %.3f AND longitude < %.3f AND month in %s AND instrument in %s ORDER BY instrument, observed, quadrant, wmo_lat, wmo_lon, year, month, day;",
wod_reader->query_bounds.latitude_south, wod_reader->query_bounds.latitude_north,
wod_reader->query_bounds.longitude_west, wod_reader->query_bounds.longitude_east,month_list,instrument_list);
if (wod_reader->debug) printf("Doing SQL string %s\n",sql_string);
printf("Doing WOD query...patience\n");
// Do the query and deal with the results
rc = sqlite3_exec(wod_reader->db, sql_string, wod_query_callback, wod_reader, &zErrMsg);
if (rc) error("SQL error: %s",zErrMsg);
return(0);
}
The callback function (wod_query_callback) in the sqlite3_exec call deals with the query results one at a time. Here'swhat's going on in there:
static int wod_query_callback(void * reader_in, int argc, char **argv, char **azColName)
{
int i;
int iend;
int got_new_file;
WOD_Reader *wod_reader;
char * inname = NULL;
int file_offset, num_bytes;
int instrument, observed, quadrant, wmo_lat, wmo_lon;
char type_lowercase[4];
char type_uppercase[4];
memset(type_uppercase,0,4);
memset(type_lowercase,0,4);
wod_reader = (WOD_Reader *) reader_in;
got_new_file = 0;
// Parse the output record from the query
for(i=0; i
{
if (wod_reader->debug) printf("%s = %s\n", azColName[i], argv[i] ? argv[i] : "NULL");
if (!strcmp(azColName[i],"instrument"))
{
instrument = atoi(argv[i]);
if (instrument != wod_reader->instrument) got_new_file++;
}
else if (!strcmp(azColName[i],"observed"))
{
observed = atoi(argv[i]);
if (observed != wod_reader->observed) got_new_file++;
}
else if (!strcmp(azColName[i],"quadrant"))
{
quadrant = atoi(argv[i]);
if (quadrant != wod_reader->quadrant) got_new_file++;
}
else if (!strcmp(azColName[i],"wmo_lat"))
{
wmo_lat = atoi(argv[i]);
if (wmo_lat != wod_reader->wmo_lat) got_new_file++;
}
else if (!strcmp(azColName[i],"wmo_lon"))
{
wmo_lon = atoi(argv[i]);
if (wmo_lon != wod_reader->wmo_lon) got_new_file++;
}
else if (!strcmp(azColName[i],"file_offset"))
{
file_offset = atoi(argv[i]);
}
else if (!strcmp(azColName[i],"num_bytes"))
{
num_bytes = atoi(argv[i]);
}
}
if (got_new_file)
{
if (wod_reader->debug) printf("Need to load a new WOD file!!\n");
// Got a new file, so close the old one...
wod_close_file(wod_reader);
memcpy(type_uppercase,wod_data_types[instrument],3);
type_lowercase[0] = tolower(type_uppercase[0]);
type_lowercase[1] = tolower(type_uppercase[1]);
type_lowercase[2] = tolower(type_uppercase[2]);
// Build up the filename
inname = calloc(strlen(wod_reader->wod_path)+50,1);
memset(inname,0,strlen(wod_reader->wod_path)+50);
// this the type of path that we're trying to construct: ctd/OBS/CTDO7807.gz
sprintf(inname,"%s/%s/OBS/%sO%d%d%0.2d.gz",
wod_reader->wod_path,type_lowercase,type_uppercase,quadrant,wmo_lat,wmo_lon);
if (wod_open_file(wod_reader,inname))
{
sprintf(inname,"%s/%s/OBS/%sO%d%d%0.2d",wod_reader->wod_path,type_lowercase,type_uppercase,
quadrant,wmo_lat,wmo_lon);
if (wod_open_file(wod_reader,inname))
{
printf("Can't open file %s ... non-existant??\n",inname);
free(inname);
inname = NULL;
return(1);
}
}
if (inname) free(inname);
inname = NULL;
}
if (wod_reader->debug) printf("%s: fseeking to %d for %d bytes\n",wod_reader->fname, file_offset,num_bytes);
// Seek to the right spot in the file
fseek(wod_reader->fp,file_offset,SEEK_SET);
// Now read the profile at that location
if ((iend = oclread (wod_reader)) == -1)
{
wod_close_file(wod_reader);
return(1);
}
// Convert the cast to OMG SVP format, this updates the extracted_watercolumns array
wod2svp(wod_reader);
return 0;
}
Now I've just got to build a query building utility in the graphical end that lets the user select instrument, month and/or other fields. Geographic bounds are currently selected by lasso in the map viewer and the month defaults to the current month.
I'm also planning on speeding up the oclread function (was delivered with the free NODC code that came with the WOD), right now it's reading things a single field at a time and I think it can be sped up a bit by reading the entire cast into memory and then parsing it from there.
jb
Subscribe to:
Posts (Atom)