Search This Blog

Thursday, July 5, 2012

Splitting up Kongsberg Watercolumn files

Sometimes you just forget to save Kongsberg multibeam water column data into a separate file.  This just happened to me on a cruise and I found that the >2GB file sizes made my 32-bit software puke.  Luckily, the file sizes didn't exceed 4GB so I decided to write a file splitter in python the pulls apart the original .all file and outputs a new .all file purged of water column datagrams AND a separate .wcd file.  Here's my first cut at it, it's a script that you feed a list of filenames and it creates a "split" subdirectory for each file and writes a split .all/.wcd combination into the split subdirectory, for example, the file:

    20120702/0003_20120702_122222_FK_EM710.all

will split into:

    20120702/split/0003_20120702_122222_FK_EM710.all    
    20120702/split/0003_20120702_122222_FK_EM710.wcd

Give it a try, I hope it doesn't nuke your data.


#!/usr/bin/env python2.6


import os
import struct
import time
import sys 


file_count=0
debug=False


dir="split"


for filename in sys.argv:


    file_count += 1


    if (file_count == 1): 
        # I'm too lazy to parse command line args so just skipping over the 
        # script name (which is arg zero in the list)
        continue


    file = open(filename, 'rb')
    filesize = os.path.getsize(filename)


    # What is the path to the input file without the filename?
    filepath=os.path.dirname(filename)
    fileprefix=os.path.basename(filename)
    if debug or True:
        print "Doing file",filename
        print "Got file path",filepath
        print "Got file basename",fileprefix


    # Join the file's directory path with the usual output subdirectory name
    outdir=os.path.join(filepath,dir)


    if not os.path.exists(outdir):
        os.makedirs(outdir)


    split_allname = os.path.join(outdir, fileprefix)
    split_wcdname = split_allname.replace(".all",".wcd")


    if debug or True:
        print filename, "will split into", split_allname, split_wcdname



    if not os.path.exists(split_allname):
        split_allfile = open(split_allname,"wb")
        split_wcdfile = open(split_wcdname,"wb")
    else:
        print "Skipping", filename, "since it's already split!"
        file.close()
        continue


    last_percent = 0
    while True:


        # Make sure we don't try to read beyond the EOF
        if (file.tell() + 6 > filesize):
            break


        line = file.read(6)


        header = struct.unpack("


        rawlength=line[0:3]
        length = header[0]
        stx = header[1]
        id = header[2]


        if (stx != 2):
            if debug:
                print 'STX not found, trying next datagram at position',file.tell()-5
            file.seek(-5,1)
            continue


        if debug:
            print 'STX found, going to try for ETX now'


        # Make sure we don't try to read beyond the EOF
        if (file.tell() + (length-5) > filesize):
            file.seek(-5,1)
            continue


        file.seek(length-5,1)


        # Make sure we don't try to read beyond the EOF
        if (file.tell() + 3 > filesize):
            break


        line = file.read(3)
        footer = struct.unpack("
        etx = footer[0]
        checksum = footer[1]

        if (etx != 3):
            if debug:
                print 'ETX not found, trying next datagram at position',file.tell()-(length+3)
            file.seek(-(length+3),1)
            continue

        # Rewind to very beginning of the datagram, including the length field
        file.seek(-(length+4),1)
        data = file.read(length+4)

        if debug:
            print "Got id", id, "and length", length

        if (id == 0x49 or id == 0x69 or id == 0x52 or id == 0x55):
            # Stuff for both files
            split_allfile.write(data)
            split_wcdfile.write(data)
        elif (id == 0x6B):
            # Just for the watercolumn file
            split_wcdfile.write(data)
        else:
            # Everything else goes into the raw file
            split_allfile.write(data)

        percent=int(100.0 * file.tell()/filesize)

        if (percent%5 == 0 and percent != last_percent):
            print percent, "% done, ALL:",split_allfile.tell()," WCD:",split_wcdfile.tell()
            last_percent = percent

        if file.tell() >= filesize:
            break

    file.close()
    split_allfile.close()
    split_wcdfile.close()

print 'All done!'



7 comments:

  1. It happened to us we stored water column data and that's killing my HD. I tried your script but it gives me a couple of errors: first I don't have python 2.6 so I tried with env 2.7. The next set of errors perhaps have to do with the change of version. That is the interpreter cannot deal with header/footer = struct.unpack(":

    header = struct.unpack("
    ^
    SyntaxError: EOL while scanning string literal

    Is this a true syntax error or is it a matter of python version. Note that I'm not a python programmer, hence perhaps my question is silly.

    Thanks for the post!

    ReplyDelete
    Replies
    1. Sorry for the wait, I rarely jump on this blog...just saw these questions today.

      Anyway, I think I found the problem: the script got mangled when I posted it in the blog. I'll track it down when I'm at work tomorrow and will repost.

      Delete
    2. Did up a new post today, you'll find a safer script on there.

      http://2bitbrain.blogspot.com/2012/11/splitting-kongsberg-water-column-data.html

      Delete
  2. Hi:

    I made some changes to the previous script. Now it works as expected. Please, comment.


    #!/usr/bin/env python
    # -*- coding: utf-8 -*-

    # This python script strips .all EM710 files into wcd (water column data) and .all bottom info. # Adapted from http://2bitbrain.blogspot.com.es/2012/07/splitting-up-kongsberg-watercolumn.html

    import os
    import struct
    import time
    import sys


    file_count=0
    debug=False


    dir="split"


    for filename in sys.argv:


    file_count += 1


    if (file_count == 1):
    # I'm too lazy to parse command line args so just skipping over the
    # script name (which is arg zero in the list)
    continue


    file = open(filename, 'rb')
    filesize = os.path.getsize(filename)


    # What is the path to the input file without the filename?
    filepath=os.path.dirname(filename)
    fileprefix=os.path.basename(filename)
    if debug or True:
    print "Doing file",filename
    print "Got file path",filepath
    print "Got file basename",fileprefix


    # Join the file's directory path with the usual output subdirectory name
    outdir=os.path.join(filepath,dir)


    if not os.path.exists(outdir):
    os.makedirs(outdir)


    split_allname = os.path.join(outdir, fileprefix)
    split_wcdname = split_allname.replace(".all",".wcd")


    if debug or True:
    print filename, "will split into", split_allname, split_wcdname



    # if not os.path.exists(split_allname):
    split_allfile = open(split_allname,"wb")
    split_wcdfile = open(split_wcdname,"wb")
    # else:
    # print "Skipping", filename, "since it's already split!"
    # file.close()
    # continue


    last_percent = 0
    while True:


    # Make sure we don't try to read beyond the EOF
    if (file.tell() + 6 > filesize):
    break


    line = file.read(6)

    header = struct.unpack(' filesize):
    file.seek(-5,1)
    continue


    file.seek(length-5,1)


    # Make sure we don't try to read beyond the EOF
    if (file.tell() + 3 > filesize):
    break


    line = file.read(3)
    footer = struct.unpack("= filesize:
    break

    file.close()
    split_allfile.close()
    split_wcdfile.close()

    print 'All done!'

    ReplyDelete
    Replies
    1. I can't see how this would work, there's a lot of missing bits from my original script... Have you tested it?

      Delete
  3. How do I actiually run this script? Any hints? I'm not a python user.
    Thanks in advance, and it will be very useful for me. :-)

    ReplyDelete
    Replies
    1. You'll need to install python on your computer, visit http://www.python.org/download/ and follow the instructions. I've only used 2.7, I really don't know how well my script will run with versions 3+.

      I'd search some tutorials on how to work with python after that. It would take me a very long time to explain how to set everything up from scratch and I would probably lead you astray very quickly.

      Delete