The script starts by downloading the "directory file" listing all available files, it looks like this:
# Title : Profile directory file of the Argo Global Data Assembly Center # Description : The directory file describes all individual profile files of the argo GDAC ftp site. # Project : ARGO # Format version : 2.0 # Date of update : 20111206174544 # FTP root number 1 : ftp://ftp.ifremer.fr/ifremer/argo/dac # FTP root number 2 : ftp://usgodae.usgodae.org/pub/outgoing/argo/dac # GDAC node : FNMOC file,date,latitude,longitude,ocean,profiler_type,institution,date_update aoml/13857/profiles/R13857_001.nc,19970729200300,0.267,-16.032,A,845,AO,20080918131927 aoml/13857/profiles/R13857_002.nc,19970809192112,0.072,-17.659,A,845,AO,20080918131929 aoml/13857/profiles/R13857_003.nc,19970820184544,0.543,-19.622,A,845,AO,20080918131931 aoml/13857/profiles/R13857_004.nc,19970831193905,1.256,-20.521,A,845,AO,20080918131933 aoml/13857/profiles/R13857_005.nc,19970911185807,0.720,-20.768,A,845,AO,20080918131934 aoml/13857/profiles/R13857_006.nc,19970922195701,1.756,-21.566,A,845,AO,20080918131936 aoml/13857/profiles/R13857_007.nc,19971003191549,2.595,-21.564,A,845,AO,20080918131938 aoml/13857/profiles/R13857_008.nc,19971014183934,1.761,-21.587,A,845,AO,20080918131940 aoml/13857/profiles/R13857_009.nc,19971025193234,1.804,-21.774,A,845,AO,20080918131941 aoml/13857/profiles/R13857_010.nc,19971105185142,1.642,-21.362,A,845,AO,20080918131943 aoml/13857/profiles/R13857_011.nc,19971116194909,1.708,-20.758,A,845,AO,20080918131945 aoml/13857/profiles/R13857_012.nc,19971127190705,2.048,-20.224,A,845,AO,20080918131947 aoml/13857/profiles/R13857_013.nc,19971208183912,2.087,-19.769,A,845,AO,20080918131948 aoml/13857/profiles/R13857_014.nc,19971219192355,2.674,-20.144,A,845,AO,20080918131950 aoml/13857/profiles/R13857_015.nc,19971230184421,2.890,-20.433,A,845,AO,20080918131952 aoml/13857/profiles/R13857_016.nc,19980110194140,2.818,-20.699,A,845,AO,20080918131954 aoml/13857/profiles/R13857_017.nc,19980121190033,2.940,-20.789,A,845,AO,20080918131956 aoml/13857/profiles/R13857_018.nc,19980201195831,3.224,-20.757,A,845,AO,20080918131957
I parse the directory file looking for a date match in the date/time field (2nd field). You could easily modify this to limit it to a specific lat/lon bounding box or any other criteria.
Here's the script:
#!/bin/bash base_argo_url=ftp://usgodae.org/pub/outgoing/argo # Download the profile index time1=`stat -f "%m" ar_index_global_prof.txt.gz` wget --timestamping $base_argo_url/ar_index_global_prof.txt.gz time2=`stat -f "%m" ar_index_global_prof.txt.gz` if [ $time1 -eq $time2 ] then echo "Nothing to do...no changes since last run" exit fi # Get today's date today=`date -u '+%Y%m%d'` echo "today is" $today mkdir $today zcat ar_index_global_prof.txt.gz | awk -F, '{if (NR > 9 && substr($2,1,8) == '$today') print $1 }' > $today/todays_casts.txt cd $today num_files=`cat todays_casts.txt | wc -l` if [ $num_files -eq 0 ] then echo "Nothing to do...no files to download yet for" $today exit fi echo "Going to check" $num_files "files" for f in `cat todays_casts.txt`; do echo "Doing file" $f if [ -e `basename $f` ] then # Skip files that have already been downloaded continue fi # Don't need time stamping here, we check locally for existence of the # .nc file so don't need to waste time requesting a listing from the FTP server wget $base_argo_url/dac/$f done
What comes out of this is a directory for the current day (named yyyymmdd) with a set of netCDF files in it (.nc file extension). Each file represents a cast from a given instrument, for example 20111206/R1900847_089.nc.
I then use a python script to read the .nc files and turn them into OMG/UNB format so that I can run comparisons against casts from RTOFS Global.
#!/usr/bin/env python2.6 import glob import netCDF4 import numpy as np import math import datetime as dt import matplotlib.pyplot as plt import os do_plot = True if do_plot: plt.figure() plt.subplot(1,2,1) plt.xlabel("Temperature, deg C") plt.ylabel("Pressure, dbar") plt.hold plt.subplot(1,2,2) plt.xlabel("Salinity, psu") plt.ylabel("Pressure, dbar") plt.hold for name in glob.glob('*.nc'): file = netCDF4.Dataset(name) latitude = file.variables['LATITUDE'][0] longitude = file.variables['LONGITUDE'][0] if math.isnan(latitude) or math.isnan(longitude): print " skipping NAN lat/lon" continue juld = file.variables['JULD'][0] # TODO: the reference date is stored in 'REFERENCE_DATE_TIME' refdate = dt.datetime(1950,1,1,0,0,0,0,tzinfo=None) castdate = refdate + dt.timedelta(days=juld) print name + " " + str(latitude) + " " + str(longitude) + " " + str(castdate) try: # Only deal with casts that have ALL the data we need p = file.variables['PRES'][0][:] t = file.variables['TEMP'][0][:] t_fill_value = file.variables['TEMP']._FillValue t_qc = file.variables['TEMP_QC'][0][:] s = file.variables['PSAL'][0][:] s_fill_value = file.variables['PSAL']._FillValue s_qc = file.variables['PSAL_QC'][0][:] except: continue # Replace masked data with NAN # This will fail if there is no masked data since netCDF4 returns # a regular numpy array if no masked data but returns a masked numpy array # if there is. try: t_mask = t.mask t[t_mask] = np.NAN except: pass try: s_mask = s.mask s[s_mask] = np.NAN except: pass try: p_mask = p.mask p[p_mask] = np.NAN except: pass # Now filter based on quality control flags (we want 1, 2 or 5) t_ind = (t_qc == '1') | (t_qc == '2') | (t_qc == '5') s_ind = (s_qc == '1') | (s_qc == '2') | (s_qc == '5') # We only want to consider valid concurrent observations of T and S pair_ind = t_ind & s_ind if do_plot: plt.subplot(1,2,1) plt.plot(t[pair_ind],-p[pair_ind]); plt.subplot(1,2,2) plt.plot(s[pair_ind],-p[pair_ind]); t_filt = t[pair_ind] s_filt = s[pair_ind] p_filt = p[pair_ind] num_samples = t_filt.size file.close if num_samples == 0: print " Skipping " + name + " due to lack of data!" continue if do_plot: plt.show()
Here's a plot of data from 2011-12-06 at 1:30PM, EST.
Here's a map showing the geographic distribution of the casts for this particular run (2011-12-06).
Still to do? Read up more about the various QC procedures applied to ARGO data and try to automate detection of casts that will mess up my comparison analysis (large chunks of missing data, etc). Here's a bit of light reading to get me started.
A la prochaine...