(I.)
Batch processing of LTA data with
quick_adcp.py
The first step is to get through one complete pass through the
processing. Sufficient documentation exists (online or contained
withing a CODAS installation) to train a person on processing LTA
data. There is far more documentation for processing of pingdata
(from the nb150), so you may still need to consult that for some
details.
The approach of this document is to take a stab at a black box, and hope it
works. I will describe the black box, the inputs and outputs, and some
rough sanity checks on the data, liberally referring to the existing
documentation (only linking to it in the html version).
This document was originally written for a unix-savvy data-literate
person going on a cruise with both a NB150 (DAS2.48) and an OS75
(VmDAS). References are made to that cruise as an example.
CODAS
(Common Ocean Data Access System) is
fundamentally a database. We load the averaged data into it and then in
various steps, access the database and write files to the disk,
manipulate those files to create different ones, and put information
from those new files back into the database.
Extractions are done with
C programs, called as "action action.cnt" (where "action.cnt" is an
ascii file with a specified format and contains configurable options
for "action"). Manipulations are done with a combination of C and
Matlab code. Usually the matlab code is a small script or wrapper which
contains variables to change, and when run, calls outside scripts or
functions to perform its duty. Changes to the database are implimented
with more C code. "LTA" refers to the original "Long Term Averages"
(5-minute averages, for instance) assembled by VmDAS and written to disk
as "*.LTA" files.
The acquisition PC is probably networked, and may be writing to a
networked disk. You need to get the data onto your computer. The
best way to do this is to make sure that the data directory is
exported (shared) for all to read, and then using your PC, just read
the LTA files from the share (or with linux use smbclient or smbmount)
to access the files.
Because the steps in processing are quite regular
we have written a Python script to automate these steps. Quick_adcp.py
runs through the steps, changing to the right directory, writing the
control file (or the matlab file), running it, and changing back to the
original directory, asking the user at each step whether to run the
next step. Al the control files are suffixed with ".tmp", all the
matlab files end in "_tmp.m", and a record of steps is written to a
file ending in ".runlog".
Complete
"processing" of
ADCP data actually requires several
iterations:
- get the velocity data and the navigation into the database,
clean up (smooth) the navigation using an oceanic reference layer,
- apply any phase or amplitude necessary (** see below) clean
up (smooth) the navigation using an oceanic reference layer, calculate
residual phase and amplitude calibration values
- edit bad bins,
profiles, and bottom interference (for
historical reasons this is a two-stage process: (a) write ascii files
to the disk which contain information about the bad data; (b) apply
these flags to the database (we turn on bits indicating bad data, we do
not actually delete any of the data) clean up (smooth) the navigation
using an oceanic reference layer, calculate residual phase and
amplitude calibration values
- make some matlab files suitable for vector or contour
plotting
(**) Calibration values: You should determine the heading source used
for your data. See this
link for more details. After the first pass of processing, there
may still be an offset in heading. Look for "phase" under the "mean"
and "median" columns cal/watertrk/adccal.out and
cal/botmtrk/btcaluv.out. If there are sufficient points (at least 10
watertrack and 40 bottomtrack is reasonable) in "edited", then a value
between the bottomtrack and watertrack values for the rotation.
Amplitude for the Melville looks OK, so you shouldn't have to make
that adjustment. Ultimately, with enough points, we expect the final
calibration amplitude mean to be around 0.997-1.003 and phase to be
around -.01 to 0.01 (stddev around 0.3-0.4 is good). In this example,
we are starting with an unknown calibration phase and amplitude, but
once we know what they are we can build that in on the first step (see
part II).
Data processing takes place in a directory created by "adcptree.py".
Do this once to start a new processing directory. Processing takes place
in that directory. Manually, each step would take place within a
subdirectory devoted to a particular steps (scanning, loading,
navigation-related, calibration-related, editing, etc). For each step,
quick_adcp.py changes directory to the appropraite location, runs the
step, and changes directory back. Quick_adcp.py needs to know various
things about the data, the most basic if which are:
variable name : what it is
------------- : ----------
yearbase : current year
datadir : where the data are
datafile_glob : wildcard expansion for files
: !! QUOTE IT if it is on the command line
: (do not quote if in a control file)
dbname : 5-character basename for database
instname : os75, os150, os38 (so output matlab files
: have good depth defaults
use_refsm : reference layer calculation (choose one of):
: use_refsm
: use_smoothr
***NOTE***
quick_adcp.py must be run in one
line of
text.
HTML is wrapping the line so you may be mislead
For data residing in /home/data/mv0407/adcp, named *.LTA, the
quick_adcp.py steps corresponding to the numbers above are:
- quick_adpc.py --yearbase 2004 --dbname a0407 --datadir
/home/data/mv0407/adcp --datafile_glob "*.LTA" --instname os75
--instclass os --datatype lta --use_refsm
- quick_adcp.py --yearbase 2004 --use_refsm --rotate_angle
-2.0 --steps2rerun rotate:navsteps:calib
-
- Go to the edit/ subdirectory, run gautoedit and edit
the data (i.e. create the ascii files which will then become flags in
the database.
- quick_adcp.py --yearbase 2004 --use_refsm --instname os75
--steps2rerun apply_edit:navsteps:calib:matfiles
- (automatically generated when adding "matfiles" to the
"steps2rerun" switch. see the following files, (do "help load_adcp" for
info on reading):
- vector/*.mat (for 1-hour 50m averages)
- contour/*.mat (for 15minute, 10m averages)
(II.)
LTA: automated batch
mode first
pass,
but editing is manual
(All the data creates a database)
There is no such thing as "simply appending" to a CODAS database.
Therefore, the next step in automation is to delete the database (or
start a new processing directory) and automate the processing of one
whole batch of files. Start by
- /bin/rm adcpdb/*blk
- or -
- make a new processing directory with adcptree.py
You did the first run-through and found out what the phase and
amplitude should be. (NOTE: since the Ashtech has been replaced, a new
value is to be expected. But we found -2.0 from before, so we'll use
that in this example.)
Now, combining (first pass + calibrating) above becomes:
quick_adpc.py --yearbase 2004 --dbname a0407 --use_refsm --rotate_angle
-2.0 --datadir /home/data/mv0407/adcp --datafile_glob "*.LTA"
--instname os75 --instclass os --datatype lta --auto
(and run editing as above)
(III.)
LTA: one-pass complete
processing
(with previously-determined phase and amplitude corrections,
using default editing parameters)
(All the data creates a database)
- /bin/rm adcpdb/*blk
- or -
- make a new processing directory with adcptree.py
Now, combining (first pass + calibrating + editing) becomes:
quick_adpc.py --yearbase 2004 --dbname a0407 --use_refsm --rotate_angle
-2.0 --datadir /home/data/mv0407/adcp --datafile_glob "*.LTA"
--instname os75 --instclass os --datatype lta --find_pflags --auto
If this is too hard to read, you can run it with a control file as:
quick_adcp.py --cntfile mv0407qpy.cnt
where the control file "mv0407pqy.cnt" would be
#### begin mv0407pqy.cnt. Anything after a "#" is a comment
--yearbase 2004 ### current year
--dbname a0407 ### prefix to adcpdb/*dir.blk
--use_refsm ### reference layer calculation
--rotate_angle -2.0 ### rotate by -2.0
# --rotate_amplitude 1.0 ### if you needed an amplitude correction
--datadir /home/data/mv0407/adcp ### look for data in this directory
--datafile_glob *.LTA ### look for ensemble.???
--instname os75 ### so matlab files will have good depths
--instclass os ### necessary
--datatype lta ### necessary
--find_pflags ### find editing flags, apply when done
--auto ### don't ask. just do it
######## end of cnt file #####
If you are going to run this unattended, you might want to add the
--hidedisplay
option so Matlab won't throw figures up to the screen when it runs.
(IV).
LTA: "On-demand" Shipboard
ADCP
processing
(Updating complete processing with
previously-determined phase and amplitude corrections,
using default editing parameters)
This section covers the "on-demand" option of ADCP processing available
in quick_adcp.py. You should be familiar with the earlier steps leading
up to this method -- if you can't run automated edited calibrated batch
mode, you won't be able to do incremental mode.
Continuing on from above, you now know how to get a calibrated, edited
complete data set from one group of files. A CODAS database consists of
"blocks", files that contain a number of profiles. A single file
"a0407dir.blk" is a blockfile dictionary, keeping track of the contents
of each block file in the database. For LTA files, a new block starts
when a configuration changes (eg. bottom track is switched on or off),
the previous block file has reached some size limit, or a new file is
loaded. The only way to "append" to the database is to delete the
blockfiles starting at the end, and delete backwards until the next
file to be loaded should also start a block. Each block file deletion
must be accompanied by a request to generate the a0407dir.blk file.
This is (supposed to be) encapsulated in the "--lastfiles N" option,
where N is the number of files to redo:
- run adcptree.py to create the processing directory
- at least once a day, and/or at the end of each cast,
(when getting ready to process ladcp data:
quick_adcp.py --cntfile mv0407qpy.cn
where the control file "mv0407pqy.cnt" would b
#### begin mv0407pqy.cnt. Anything after a "#" is a commen
--yearbase 2004 ### current yea
--dbname a0407 ### prefix to adcpdb/*dir.bl
--use_refsm ### reference layer calculatio
--rotate_angle -2.0 ### rotate by -2.
# --rotate_amplitude 1.0 ### if you needed an amplitude correctio
--datadir /home/data/mv0407/adcp ### look for data in this director
--datafile_glob "ensemble.???" ### look for ensemble.??
--find_pflags ### find editing flags, apply when don
--auto ### don't ask. just do i
--lastfiles 1 ### should delete enough to reload the
### most recent fil
######## end of cnt file ####
This particular combination of options has not been tested at sea.
Lucky you! If there is a problem, here are some probable causes ...
Things that might fail:
- cron is tricky with paths, make sure you can run it on
the command line without incident before you try to do it with cron.
- once you are in the "lastfiles" mode, you are stuck
there. You can't go back to batch processing. Always make sure you
specify enough N files to "reload" that you overlap the existing
database.
- if you have a faliure that says "not monotonic" chances
are high that data got loaded into the database twice. CODAS can't deal
with repeated timestamps.
- at the beginning, there might be a problem with deleting
back to the beginning of the database, eg. On day 2 if you say
"--lstfiles 3" it might croak.
... and a prudent approach to avoid failure: "On-demand batch
processing":
Follow the earlier steps for batch-processing, working your way up to
the most automated processing there (i.e. one step, with the correct
calibration values, but in a batch mode). If you cannot do that, then
this "lastfiles" approach is doomed to fail. If you can do it, then
until you have 4 days of data loaded, repeat the following:
- delete the database (/bin/rm adcpdb/*blk)
- run the last version of README.vmadcp_batch (III) to get
a fully processed, edited, calibrated dataset
(to switch to incremental processing, when you reach several (4-6) days
of
processed data, try adding "--lastfiles 1" to the control file. )
If you still can't get "--lastfiles" to work, you can always do the
"on-demand batch mode" (see previous) for the whole cruise. Your
mv0407.runlog file will get big, and the edit/a*.asc files will get
big, but those can be deleted before each run if you wish.