(I.)

Batch processing of pingdata with quick_adcp.py


The first step is to get through one complete pass through the processing. Sufficient documentation exists (online or contained withing a CODAS installation) to train a person on processing pingdata. The documentation assumes they have the time and the desire to learn how to do it. The approach of this document is to take a stab at a black box, and hope it works. I will describe the black box, the inputs and outputs, and some rough sanity checks on the data, liberally referring to the existing documentation (only linking to it in the html version).

This document was originally written for a unix-savvy data-literate person going on a cruise with both a NB150 (DAS2.48) and an OS75 (VmDAS). References are made to that cruise as an example.


CODAS (Common Ocean Data Access System) is fundamentally a database. We load the averaged data into it and then in various steps, access the database and write files to the disk, manipulate those files to create different ones, and put information from those new files back into the database.

Extractions are done with C programs, called as "action action.cnt" (where "action.cnt" is an ascii file with a specified format and contains configurable options for "action"). Manipulations are done with a combination of C and Matlab code. Usually the matlab code is a small script or wrapper which contains variables to change, and when run, calls outside scripts or functions to perform its duty. Changes to the database are implimented with more C code. "Pingdata" refers to the original 5-minute averages assembled by DAS2.48 and written to disk as "PINGDATA.???" files.

DAS2.48 runs on an old PC and has several additional programs that run with it, (1) a "watchdog" which reboots the computer if no DAS activity is seen within a specified time, and (2) "ue4" which logs Ashtech data, does QC on it and keeps track of the Ashtech-gyro correction, and logs GGA navigation.

There are two mechanisms for getting the 5-minute averaged data to your computer:
  1. Ue4 can also spit out the 5-minute data (a short blast of binary) on a serial port. With sufficient warning and Marine Tech willingness and participation, ue4 can be set up to do this. You must acquire the data on an appropriately-configured linux box with ser_bin (eg. as one-day files called "ensemble.JJJ", where JJJ is the decimal yearday).
  2. The acquisition PC may concievably be networked, or you could use sneakernet (once a day, to stop data acquisition and carry the pingdata files over to your computer by zip disk or something)


In this document, "ensemble.JJJ" and "pingdata.???" are equivalent.

Because the steps in processing are quite regular...


...we have written a Python script to automate these steps. Quick_adcp.py runs through the steps, changing to the right directory, writing the control file (or the matlab file), running it, and changing back to the original directory, asking the user at each step whether to run the next step. Al the control files are suffixed with ".tmp", all the matlab files end in "_tmp.m", and a record of steps is written to a file ending in ".runlog".

Complete "processing" of ADCP data actually requires several iterations:
  1. get the velocity data and the navigation into the database, clean up (smooth) the navigation using an oceanic reference layer, rotate the headings according to the Ashtech-gyro correction calculate residual phase and amplitude calibration values
  2. apply any phase or amplitude necessary (** see below) clean up (smooth) the navigation using an oceanic reference layer, calculate residual phase and amplitude calibration values
  3. edit bad bins, profiles, and bottom interference (for historical reasons this is a two-stage process: (a) write ascii files to the disk which contain information about the bad data; (b) apply these flags to the database (we turn on bits indicating bad data, we do not actually delete any of the data) clean up (smooth) the navigation using an oceanic reference layer, calculate residual phase and amplitude calibration values
  4. make some matlab files suitable for vector or contour plotting

(**) Calibration values: After the Ashtech-gyro correction has been applied, there may still be an offset in heading. Look for "phase" under the "mean" and "median" columns cal/watertrk/adccal.out and cal/botmtrk/btcaluv.out. If there are sufficient points (at least 10 watertrack and 40 bottomtrack is reasonable) in "edited", then a value between the bottomtrack and watertrack values for the rotation. Amplitude for the Melville looks OK, so you shouldn't have to make that adjustment. Ultimately, with enough points, we expect the final calibration amplitude mean to be around 0.997-1.003 and phase to be around -.01 to 0.01 (stddev around 0.3-0.4 is good). In this example, we are starting with an unknown calibration phase and amplitude, but once we know what they are we can build that in on the first step (see part II).

Data processing takes place in a directory created by "adcptree.py". Do this once to start a new processing directory. Processing takes place in that directory. Manually, each step would take place within a subdirectory devoted to a particular steps (scanning, loading, navigation-related, calibration-related, editing, etc). For each step, quick_adcp.py changes directory to the appropraite location, runs the step, and changes directory back. Quick_adcp.py needs to know various things about the data, the most basic if which are:
  
variable name : what it is
------------- : ----------
yearbase : current year
datadir : where the data are
datafile_glob : wildcard expansion for files
: !! QUOTE IT if it is on the command line
: (do not quote if in a control file)
dbname : 5-character basename for database
use_refsm : reference layer calculation (choose one of:
: use_refsm
: use_smoothr

***NOTE***
quick_adcp.py must be run in one line of text.
HTML is wrapping the line so you may be mislead

For data residing in /home/data/mv0407/adcp, named ensemble.???, the quick_adcp.py steps corresponding to the numbers above are:

  1. quick_adpc.py --yearbase 2004 --dbname a0407 --datadir /home/data/mv0407/adcp --datafile_glob "ensemble.???" --use_refsm

  2. quick_adcp.py --yearbase 2004 --use_refsm --rotate_angle -2.0 --steps2rerun rotate:navsteps:calib

  3. (automatically generated when adding "matfiles" to the "steps2rerun" switch. see the following files, (do "help load_adcp" for info on reading):


(II.)

pingdata: automated batch mode first pass,
but editing is manual

(All the data creates a database)

There is no such thing as "simply appending" to a CODAS database. Therefore, the next step in automation is to delete the database (or start a new processing directory) and automate the processing of one whole batch of files. Start by

  1. /bin/rm adcpdb/*blk
  2. - or -
  3. make a new processing directory with adcptree.py

You did the first run-through and found out what the phase and amplitude should be. (NOTE: since the Ashtech has been replaced, a new value is to be expected. But we found -2.0 from before, so we'll use that in this example.)

Now, combining (first pass + calibrating) above becomes:

quick_adpc.py --yearbase 2004 --dbname a0407 --use_refsm --rotate_angle -2.0 --datadir /home/data/mv0407/adcp --datafile_glob "ensemble.???" --auto

(and run editing as above)


(III.)
pingdata: one-pass complete processing
(with previously-determined phase and amplitude corrections,
using default editing parameters)
(All the data creates a database)

  1. /bin/rm adcpdb/*blk
  2. - or -
  3. make a new processing directory with adcptree.py

Now, combining (first pass + calibrating + editing) becomes:
quick_adpc.py --yearbase 2004 --dbname a0407 --use_refsm --rotate_angle -2.0 --datadir /home/data/mv0407/adcp --datafile_glob "ensemble.???" --find_pflags --auto

If this is too hard to read, you can run it with a control file as:
   
quick_adcp.py --cntfile mv0407qpy.cnt

where the control file "mv0407pqy.cnt" would be
#### begin mv0407pqy.cnt. Anything after a "#" is a comment
--yearbase 2004 ### current year
--dbname a0407 ### prefix to adcpdb/*dir.blk
--use_refsm ### reference layer calculation
--rotate_angle -2.0 ### rotate by -2.0
# --rotate_amplitude 1.0 ### if you needed an amplitude correction
--datadir /home/data/mv0407/adcp ### look for data in this directory
--datafile_glob "ensemble.???" ### look for ensemble.???
--find_pflags ### find editing flags, apply when done
--auto ### don't ask. just do it
######## end of cnt file #####



If you are going to run this unattended, you might want to add the
--hidedisplay
option so Matlab won't throw figures up to the screen when it runs.


(IV).
pingdata: "On-demand" Shipboard ADCP processing
(Updating complete processing with
previously-determined phase and amplitude corrections,
using default editing parameters)

This section covers the "on-demand" option of ADCP processing available in quick_adcp.py. You should be familiar with the earlier steps leading up to this method -- if you can't run automated edited calibrated batch mode, you won't be able to do incremental mode.

Continuing on from above, you now know how to get a calibrated, edited complete data set from one group of files. A CODAS database consists of "blocks", files that contain a number of profiles. A single file "a0407dir.blk" is a blockfile dictionary, keeping track of the contents of each block file in the database. For pingdata, a new block starts when a configuration changes (eg. bottom track is switched on or off), the previous block file has reached some size limit, or a new file is loaded. The only way to "append" to the database is to delete the blockfiles starting at the end, and delete backwards until the next file to be loaded should also start a block. Each block file deletion must be accompanied by a request to generate the a0407dir.blk file.

This is (supposed to be) encapsulated in the "--lastfiles N" option, where N is the number of files to redo:

  1. run adcptree.py to create the processing directory
  2. at least once a day, and/or at the end of each cast, (when getting ready to process ladcp data:
 

quick_adcp.py --cntfile mv0407qpy.cn

where the control file "mv0407pqy.cnt" would b

#### begin mv0407pqy.cnt. Anything after a "#" is a commen
--yearbase 2004 ### current yea
--dbname a0407 ### prefix to adcpdb/*dir.bl
--use_refsm ### reference layer calculatio
--rotate_angle -2.0 ### rotate by -2.
# --rotate_amplitude 1.0 ### if you needed an amplitude correctio
--datadir /home/data/mv0407/adcp ### look for data in this director
--datafile_glob ensemble.??? ### look for ensemble.??
--find_pflags ### find editing flags, apply when don
--auto ### don't ask. just do i
--lastfiles 1 ### should delete enough to reload the
### most recent fil
######## end of cnt file ####

This particular combination of options has not been tested at sea. Lucky you! If there is a problem, here are some probable causes ...

Things that might fail:
... and a prudent approach to avoid failure: "On-demand batch processing":

Follow the earlier steps for batch-processing, working your way up to the most automated processing there (i.e. one step, with the correct calibration values, but in a batch mode). If you cannot do that, then this "lastfiles" approach is doomed to fail. If you can do it, then until you have 4 days of data loaded, repeat the following:

  1. delete the database (/bin/rm adcpdb/*blk)
  2. run the last version of README.vmadcp_batch (III) to get a fully processed, edited, calibrated dataset

(to switch to incremental processing, when you reach several (4-6) days of processed data, try adding "--lastfiles 1" to the control file. )

If you still can't get "--lastfiles" to work, you can always do the "on-demand batch mode" (see previous) for the whole cruise. Your mv0407.runlog file will get big, and the edit/a*.asc files will get big, but those can be deleted before each run if you wish.