Candidate Neural Network Architecture : PredNet

While I sketched out a provisional idea of how I reckoned the network could look, I’m doing what I can to avoid reinventing the wheel. As it happens there’s a Deep Learning problem with implemented solutions that I believe is close enough to the earthquake prediction problem to make a good starting point : predicting the next frame(s) in a video. You train the network on a load of sample video data, then at runtime give it a short sequence and let it figure out what happens next.

This may seem a bit random, but I think I have good justification. The kind of videos people have been working with are things like human movement or motion of a car. (Well, I’ve seen one notable, fun, exception : Adversarial Video Generation is applied to the activities of Mrs. Pac-Man). In other words, a projection of objects obeying what is essentially Newtonian physics. Presumably seismic events follow the same kind of model. As mention in my last post, I’m currently planning on using online data that places seismic events on a map – providing the following: event time, latitude, longitude, depth and magnitude. The video prediction nets generally operate over time on x, y with R, G, B for colour. Quite a similar shape of data.

So I had a little trawl of what was out there.  There are a surprisingly wide variety of strategies, but one in particular caught my eye : PredNet. This is described in the paper Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning (William Lotter, Gabriel Kreiman & David Cox from Harvard) and has supporting code etc. on GitHub. Several things about it appealed to me. It’s quite an elegant conceptual structure, which translates in practice into a mix of convnets/RNNs, not too far from what I anticipated needing for this application. This (from the paper) might give you an idea:

prednet-block

Another plus from my point of view was that the demo code is written using Keras on Tensorflow, exactly what I was intending to use.

Yesterday I had a go at getting it running.  Right away I hit a snag: I’ve got this laptop set up for Tensorflow etc. on Python 3, but the code uses hickle.py, which uses Python 2. I didn’t want to risk messing up my current setup (took ages to get working) so had a go at setting up a Docker container – Tensorflow has an image. Day-long story short, something wasn’t quite right. I suspect the issues I had related to nvidia-docker, needed to run on GPUs.

Earlier today I decided to have a look at what would be needed to get the PredNet code Python3-friendly. Running kitti-train.py (Kitti is the demo data set) led straight to an error in hickle.py. Nothing to lose, had a look. “Hickle is a HDF5 based clone of Pickle, with a twist. Instead of serializing to a pickle file, Hickle dumps to a HDF5 file.“. There is a note saying there’s Python3 support in progress, but the cause of the error turned out to be –

if isinstance(f, file):

file isn’t a thing in Python3. But kitti-train.py was only passing a filename to this, via data-utils.py, so I just commented out the lines associated with the isinstance. (I guess I should fix it properly, feed back to Hickle’s developer.)

It worked! Well, at least for kitti-train.py. I’ve got it running in the background as I type. This laptop only has a very wimpy GPU (GeForce 920M) and it took a couple of tweaks to prevent near-immediate out of memory errors:

export TF_CUDNN_WORKSPACE_LIMIT_IN_MB=100

kitty_train.py, line 35
batch_size = 2 #was 4

It’s taken about an hour to get to epoch 2/150, but I did renice Python way down so I could get on with other things.

Seismic Data

I’ve also spent a couple of hours on the (seismic) data-collecting code. I’d foolishly started coding around this using Javascript/node, simply because it was the last language I’d done anything similar with. I’ve got very close to having it gather & filter blocks of from the INGV service and dump to (csv) file. But I reckon I’ll just ditch that and recode it in Python, so I can dump to HDF5 directly – it does seem a popular format around the Deep Learning community.

Radio Data

Yes, that to think about too.

My gut feeling is that applying Deep Learning to the seismic data alone is likely to be somewhat useful for predictions. From what I’ve read, the current approaches being taken (in Italy at least) are effectively along these lines, leaning towards traditional statistical techniques. No doubt some folks are applying Deep Learning to the problem. But I’m hoping that bringing in radio precursors will make a major difference in prediction accuracy.

So far I have in mind generating spectrograms from the VLF/ELF signals. Which gives a series of images…sound familiar? However, I suspect that there won’t be quantitatively all that much information coming from this source (though qualitatively, I’m assuming vital).  As a provisional plan I’m thinking of pushing it through a few convnet/pooling layers to get the dimensionality way down, then adding that data as another input to the  PredNet.

Epoch 3/150 – woo-hoo!

PS.

It was taking way too long for my patience, so I changed the parameters a bit more:

nb_epoch = 50 # was 150
batch_size = 2 # was 4
samples_per_epoch = 250 # was 500
N_seq_val = 100 # number of sequences to use for validation

It took ~20 hours to train. For kitti_evaluate.py, it has produced some results, but also exited with an error code. Am a bit too tired to look into it now, but am very pleased to get a bunch of these:

 

 

 

New Strategy for Seismic Data

I’m a massive procrastinator and not a very quick thinker. There is something positive about all this. I’ve barely looked at code in the past few weeks, but have been thinking about it, and may well have saved some time. (I usually get there in the end…).

The neural net setup I had in mind was based on the assumption that I’d have my own, local, data sources (sensors). But hardware is still on hold until I find some funds. So I’ve been re-evaluating how best to use existing online data.

Now I am pleased with the idea of taking the VLF radio data as spectrograms and treating them (conceptually) as images, so I can exploit existing Deep Learning setups. If I’m not getting the seismic data as time series from a local sensor(s) but from INGV, I can use the same trick. They have a nice straightforward Web Service offering Open Data as QuakeML (XML) over HTTP from URL queries. They also render it like this:

seismo-map

So I’m thinking of taking the magnitude & depth data from the web service and placing it in a grid (say 256×256) derived from the geo coordinates. To handle the time aspect, for each cell in the grid I reckon picking the max magnitude event over each 6 (?) hour window. And then (conceptually) treating this as a pixel map. I need to read up a little more, but this looks again like something that might well yield to convnets, constructed very like the radio data input.

PS.

I made a little start on coding this up. First thing was to decide what area I wanted to monitor. Key considerations were: it must include here (self-preservation!); it must cover a fair distance around the VLF data monitoring station who’s data I’m going to use (Renato Romero’s, here); it should include the main seismically active regions likely to impact those two places.

You can get an idea of the active regions from the map above, here’s another one showing estimated risk:

hotspots

In one of the papers I’ve read that the radio precursors are only really significant for a 100km or so (to be confirmed), so I’ve chosen the area between the 4 outer markers here:

area

This corresponds to latitude 40N-47N, longitude 7E-15E.

The marker middle-left is the VLF station, the one in the middle is where I live.

It looked like the kind of thing where I could easily get my lats & longs mixed up so I coded up the little map using Google Maps, was very easy, source – rendered it on JSFiddle.

Next I need to get the code together to make the services requests, filter & aggregate the seismic event data is some convenient for ready for network training.