Candidate Neural Network Architecture : PredNet

While I sketched out a provisional idea of how I reckoned the network could look, I’m doing what I can to avoid reinventing the wheel. As it happens there’s a Deep Learning problem with implemented solutions that I believe is close enough to the earthquake prediction problem to make a good starting point : predicting the next frame(s) in a video. You train the network on a load of sample video data, then at runtime give it a short sequence and let it figure out what happens next.

This may seem a bit random, but I think I have good justification. The kind of videos people have been working with are things like human movement or motion of a car. (Well, I’ve seen one notable, fun, exception : Adversarial Video Generation is applied to the activities of Mrs. Pac-Man). In other words, a projection of objects obeying what is essentially Newtonian physics. Presumably seismic events follow the same kind of model. As mention in my last post, I’m currently planning on using online data that places seismic events on a map – providing the following: event time, latitude, longitude, depth and magnitude. The video prediction nets generally operate over time on x, y with R, G, B for colour. Quite a similar shape of data.

So I had a little trawl of what was out there.  There are a surprisingly wide variety of strategies, but one in particular caught my eye : PredNet. This is described in the paper Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning (William Lotter, Gabriel Kreiman & David Cox from Harvard) and has supporting code etc. on GitHub. Several things about it appealed to me. It’s quite an elegant conceptual structure, which translates in practice into a mix of convnets/RNNs, not too far from what I anticipated needing for this application. This (from the paper) might give you an idea:

prednet-block

Another plus from my point of view was that the demo code is written using Keras on Tensorflow, exactly what I was intending to use.

Yesterday I had a go at getting it running.  Right away I hit a snag: I’ve got this laptop set up for Tensorflow etc. on Python 3, but the code uses hickle.py, which uses Python 2. I didn’t want to risk messing up my current setup (took ages to get working) so had a go at setting up a Docker container – Tensorflow has an image. Day-long story short, something wasn’t quite right. I suspect the issues I had related to nvidia-docker, needed to run on GPUs.

Earlier today I decided to have a look at what would be needed to get the PredNet code Python3-friendly. Running kitti-train.py (Kitti is the demo data set) led straight to an error in hickle.py. Nothing to lose, had a look. “Hickle is a HDF5 based clone of Pickle, with a twist. Instead of serializing to a pickle file, Hickle dumps to a HDF5 file.“. There is a note saying there’s Python3 support in progress, but the cause of the error turned out to be –

if isinstance(f, file):

file isn’t a thing in Python3. But kitti-train.py was only passing a filename to this, via data-utils.py, so I just commented out the lines associated with the isinstance. (I guess I should fix it properly, feed back to Hickle’s developer.)

It worked! Well, at least for kitti-train.py. I’ve got it running in the background as I type. This laptop only has a very wimpy GPU (GeForce 920M) and it took a couple of tweaks to prevent near-immediate out of memory errors:

export TF_CUDNN_WORKSPACE_LIMIT_IN_MB=100

kitty_train.py, line 35
batch_size = 2 #was 4

It’s taken about an hour to get to epoch 2/150, but I did renice Python way down so I could get on with other things.

Seismic Data

I’ve also spent a couple of hours on the (seismic) data-collecting code. I’d foolishly started coding around this using Javascript/node, simply because it was the last language I’d done anything similar with. I’ve got very close to having it gather & filter blocks of from the INGV service and dump to (csv) file. But I reckon I’ll just ditch that and recode it in Python, so I can dump to HDF5 directly – it does seem a popular format around the Deep Learning community.

Radio Data

Yes, that to think about too.

My gut feeling is that applying Deep Learning to the seismic data alone is likely to be somewhat useful for predictions. From what I’ve read, the current approaches being taken (in Italy at least) are effectively along these lines, leaning towards traditional statistical techniques. No doubt some folks are applying Deep Learning to the problem. But I’m hoping that bringing in radio precursors will make a major difference in prediction accuracy.

So far I have in mind generating spectrograms from the VLF/ELF signals. Which gives a series of images…sound familiar? However, I suspect that there won’t be quantitatively all that much information coming from this source (though qualitatively, I’m assuming vital).  As a provisional plan I’m thinking of pushing it through a few convnet/pooling layers to get the dimensionality way down, then adding that data as another input to the  PredNet.

Epoch 3/150 – woo-hoo!

PS.

It was taking way too long for my patience, so I changed the parameters a bit more:

nb_epoch = 50 # was 150
batch_size = 2 # was 4
samples_per_epoch = 250 # was 500
N_seq_val = 100 # number of sequences to use for validation

It took ~20 hours to train. For kitti_evaluate.py, it has produced some results, but also exited with an error code. Am a bit too tired to look into it now, but am very pleased to get a bunch of these:

 

 

 

Advertisements

New Strategy for Seismic Data

I’m a massive procrastinator and not a very quick thinker. There is something positive about all this. I’ve barely looked at code in the past few weeks, but have been thinking about it, and may well have saved some time. (I usually get there in the end…).

The neural net setup I had in mind was based on the assumption that I’d have my own, local, data sources (sensors). But hardware is still on hold until I find some funds. So I’ve been re-evaluating how best to use existing online data.

Now I am pleased with the idea of taking the VLF radio data as spectrograms and treating them (conceptually) as images, so I can exploit existing Deep Learning setups. If I’m not getting the seismic data as time series from a local sensor(s) but from INGV, I can use the same trick. They have a nice straightforward Web Service offering Open Data as QuakeML (XML) over HTTP from URL queries. They also render it like this:

seismo-map

So I’m thinking of taking the magnitude & depth data from the web service and placing it in a grid (say 256×256) derived from the geo coordinates. To handle the time aspect, for each cell in the grid I reckon picking the max magnitude event over each 6 (?) hour window. And then (conceptually) treating this as a pixel map. I need to read up a little more, but this looks again like something that might well yield to convnets, constructed very like the radio data input.

PS.

I made a little start on coding this up. First thing was to decide what area I wanted to monitor. Key considerations were: it must include here (self-preservation!); it must cover a fair distance around the VLF data monitoring station who’s data I’m going to use (Renato Romero’s, here); it should include the main seismically active regions likely to impact those two places.

You can get an idea of the active regions from the map above, here’s another one showing estimated risk:

hotspots

In one of the papers I’ve read that the radio precursors are only really significant for a 100km or so (to be confirmed), so I’ve chosen the area between the 4 outer markers here:

area

This corresponds to latitude 40N-47N, longitude 7E-15E.

The marker middle-left is the VLF station, the one in the middle is where I live.

It looked like the kind of thing where I could easily get my lats & longs mixed up so I coded up the little map using Google Maps, was very easy, source – rendered it on JSFiddle.

Next I need to get the code together to make the services requests, filter & aggregate the seismic event data is some convenient for ready for network training.

Provisional Graph

I’ve now located the minimum data sources needed to start putting together the neural network for this system. I now need to consider how to sample & shape this data. To this end I’ve roughed out a graph – it’s short on details and will undoubtedly change, but should be enough to decide on how to handle the inputs.

To reiterate the aim, I want to take ELF/VLF (and historical seismic) signals and use them to predict future seismic events.

As an overall development strategy, I’m starting with a target of the simplest thing that could possibly work, and iteratively moving towards something with a better chance of working.

Data Sources

I’ve not yet had a proper look at what’s available as archived data, but I’m pretty sure what’s needed will be available.  The kind of anomalies that precede earthquakes will be relatively rare, so special case signals will be important in training the network. However, the bulk of training data and runtime data will come come from live online sources.

Seismic Data

Prior work (eg OPERA) suggests that clear radio precursors are usually only associated with fairly extreme events, and even those are only detectable using traditional means for geographically close earthquakes. The main hypothesis of this project is that Deep Learning techniques may pick up more subtle indicators, but all the same it makes sense to focus initially on more local, more significant events.

The Istituto Nazionale di Geofisica e Vulcanologia (INGV) provides heaps of data, local to Italy and worldwide. A recent event list can be found here. Of what they offer I found it easiest to code against their Atom feed which gives weekly event summaries. (No surprise I found it easiest, I had a hand in the development of RFC4287 🙂

I’ve put together some basic code for GETting and parsing this feed.

Radio Data

The go-to site for natural ELF/VLF radio information is vlf.it and it’s maintainer Renato Romero has a station located in northern Italy. The audio from this is streamed online (along with other channels) by Paul Nicholson. Reception, logging and some processing of this data is possible using Paul’s VLF Receiver Software Toolkit. I found it straightforward to get a simple spectrogram from Renato’s transmissions using these tools. I’ve not set up a script for logging yet, but I’ll probably get that done later today.

It will be desirable to visualise the VLF signal to look for interesting patterns and the best way of doing this is through spectrograms. Conveniently, this makes the problem of recognising anomalies essentially a visual recognition task – the kind of thing the Deep Learning literature is full of.

The Provisional Graph

Here we go –

provisional-nn-2017-07-03

CNN – convolutional neural network subsystem
RNN – recurrent neural network subsystem (probably LSTMs)
FCN – fully connected network (old-school backprop ANN)

This is what I’m picturing for the full training/runtime system. But I’m planning to set up pre-training sessions. Imagine RNN 3 and its connections removed. On the left will be a VLF subsystem and on the right a seismic subsystem.

Pre-Training

In this phase, data from VLF logs will be presented as a set of labeled spectrograms to a multi-layer convolutional network CNN. VLF signals contain a variety of known patterns, which include:

  • Man-made noise – the big one is 50Hz mains hum (and its harmonics), but other sources include things like industrial machinery, submarine radio transmissions.
  • Sferics – atmospherics, the radio waves caused by lightning strikes in a direct path to the receiver. These appear as a random crackle of impulses.
  • Tweeks – these again are caused by lightning strikes but the impulses are stretched out through bouncing between the earth and the ionosphere. They sound like brief high-pitched pings.
  • Whistlers – the impulse of a lightning strike can find its way into the magnetosphere and follow a path to opposite side of the planet, possibly bouncing back repeatedly. These sound like descending slide whistles.
  • Choruses – these are caused by the solar wind hitting the magnetosphere and sound like a chorus of birds or frogs.
  • Other anomalous patterns – planet Earth and it’s environs are a very complex system and there are many other sources of signals. Amongst these (it is assumed here) will be earthquake precursors caused by geoelectric activity.

Sample audio recordings of the various signals can be found at vlf.it and Natural Radio Lab. They can be quite bizarre. The key reference on these is Renato Romero’s book Radio Nature – strongly recommended to anyone with any interest in this field. It’s available in English and Italian (I got my copy from Amazon).

So…with the RNN 3 path out of the picture, it should be feasible to set up the VLF subsystem as a straightforward image classifier.

On the right hand side, the seismic section, I imagine the pre-training phase being a series of stages, at least with: seismic data->RNN 1; seismic data->RNN 1->RNN 2. If you’ve read The Unreasonable Effectiveness of Recurrent Neural Networks (better still, played with the code – I got it to write a Semantic Web “specification”) you will be aware of how good LSTMs can be at picking up patterns in series. But it’s pretty clear that the underlying system behind geological events will be a lot more complex than the rules of English grammar & syntax. But I’m (reasonably) assuming that sequences of events, ie predictable patterns do occur in geological systems. While I’m pretty certain that this alone won’t allow useful prediction with today’s technology, it should add information to the system as a whole in the form of probabilistic ‘shapes’. Work already done elsewhere would seem to bear this out (eg see A Deep Neural Network to identify foreshocks in real time).

Training & Prediction

Once the two subsystems have been pre-trained for what seems a reasonable length of time, I’ll glue them together, retaining the learnt weights. The VLF spectrograms will now be presented as a temporal sequence, and I strongly suspect the time dimension will have significance in this data, hence the insertion of extra memory in the form of RNN 3.

At this point I currently envisage training the system in real time using live data feeds.  (So the seismic sequence on the right will be time now, and the inputs on the left will be now-n). I’m not entirely sure yet how best to flip between training and predicting, worst case periodically cloning the whole system and copying weights across.

A more difficult unknown for me right now is how best to handle the latency between (assumed) precursors and events.  The precursors may appear hours, days, weeks or more before the earthquakes. While I’m working on the input sections I think I need to read up a lot more on Deep Learning & cross-correlation.

Reading online VLF

For the core of the VLF handling section of the neural nets, my current idea couldn’t be much more straightforward. Take periodic spectrograms of the signal(s) and use them as input to a CNN-based visual recognition system. There are loads of setups for these available online. The ‘labeling’ part will (somehow) come from the seismic data handling section (probably based around an RNN). This is the kind of pattern that hopefully the network will be able to recognise (the blobby bits around 5kHz):

Screenshot from 2017-07-01 18-07-52

“Spectrogramme of the signal recorded on September 10, 2003 and concerning the earthquake with magnitude 5.2 that occurred in the Tosco Emiliano Apennines, at a distance of about 270 km from the station, on September 14, 2003.” . From Nardi & Caputo, A perspective electric earthquake precursor observed in the Apennines

It’ll be a while yet before I’ll have my own VLF receiver set up, but in the meantime various VLF receiver stations have live data online, available through vlf.it. This can be listened to in a browser, e.g. Renato Romero’s feed from near Turin at http://78.46.38.217:80/vlf15 (have a listen!).

So how to receive the data and generate spectrograms? Like a fool I jumped right in without reading around enough. I wasted a lot of time taking the data over HTTP from the link above into Python and trying to get it into a usable form from there. That data is transmitted using Icecast, specifically using an Ogg Vorbis stream. But the docs are thin on the ground so decoding the stream became an issue. It appears that an Ogg header is sent once, then a continuous stream. But there I got stuck, couldn’t make sense of the encoding, leading me to look back at the docs around how the transmission was done. Ouch! I really had made a rod for my own back.

Reading around Paul Nicholson’s pages on the server setup, it turns out that the data is much more readily available with the aid of Paul’s VLF Receiver Software Toolkit. This is a bunch of Unixy modules. I’ve still a way to go in putting together suitable shell scripts, definitely not my forte. But it shouldn’t be too difficult, within half an hour I was able to get the following image:

img

First I installed vlfrx-tools, (a straightforward source configure/make install, though note that in latest Ubuntu in the prerequisites it’s libpng-dev not libpng12-dev). Then ran the following:

vtvorbis -dn 78.46.38.217,4415 @vlf15

– this takes Renato’s stream and decodes it into buffer @vlf15.

With that running, in another terminal ran:

vtcat -E30 @vlf15 | vtsgram -p200 -b300 -s '-z60 -Z-30' > img.png

– which pulls out 30 seconds from the buffer and pipes it to a script wrapping the Sox audio utility to generate the spectrogram.

 

 

 

Progress (of sorts)

[work in progress 2017-04-15 – material was getting long so splitting off to separate pages]

General Status

In the last month or so I’ve done a little hardware experimentation on the radio side of things, along with quite a lot of research all around the subjects. I also had a small setback. About 3 months ago I ordered a pile of components, they never arrived, a screw-up by the distributor (I wasn’t charged). Unfortunately I don’t have the funds right now to re-order things, so although I do have the resources for some limited experimentation, I can’t go full speed on the hardware. (Once I’ve got a little further along, I reckon I’ll add a ‘Donate’ button to this site – worth a try!)

This might be a blessing in disguise. Being constrained in what I can do has forced me to re-evaluate the plans. In short, in the near future I’m going to have to rely mostly on existing online data sources and simplify wherever possible. This is really following the motto of Keep It Simple, Stupid, something I should maybe have been considering from the start.

 

Next Steps

Localising Online Data – baby steps

While I’m intending to build a simple seismometer in the near future, there’s a lot of data already available online. Generally this seems to be available in two forms : near real-time output of seismometers and feeds of discrete events. I’ve not yet started investigating the former as the latter seems an easier entry point to start coding around.

I’m based in northern Italy, and as it happens there are some very good resources online for this region (hardly surprising, as earthquakes are a significant threat to life & property in Italy). A hub for these is the Istituto Nazionale di Geofisica e Vulcanologia (INGV).

Web Services

The service with the snappy name Event Federation of Digital Seismograph Networks Web Services  allows queries with a variety of parameters, returning data in a choice of three formats : QuakeML (a rich, dedicated XML), KML (another XML, Google’s version of GML) and text. Hat tip to the folks behind this, it’s Open Data (CC Attribution).

The INGV also offers a preset translation of the previous week’s events in Atom format (augmented with Dublin Core, W3C Geo & GeoRSS terms for event datetime and location). This is a minimal version of data from the Web service, but contains all I’m interested in for now: event magnitude, location & time. (I also have a personal bias towards Atom – it’s the first and I think only RFC in which my name appears 🙂

Relocating the Event

The potential utility of this project is about the risk of seismic events felt at a specific location. The feed gives the magnitude of the event at epicentre. It may be reasonable to process the raw data to reflect this.

Whether or not it will be better to use such data raw (i.e. separate  inputs for magnitude, latitude & longitude) or pre-processed as input to neural nets remains to be seen. But to flag up high risk at the target location, some combined measure is desirable.

The true values depend on a huge number of factors (for more on this see e.g. The attenuation of seismic intensity by Chiara Pasolini). As a first attempt at a useful approximation I’ll do the following:

  1. restrict data to events within 200km of the ‘home’ location
  2. apply an inverse-square function to the magnitude over the distance

Both are guesstimates of the relative event significance. Although major events beyond 200km may well have significant impact locally, for the purposes of prediction it seems reasonable to exclude these. For these purposes the data is going to be fed to a machine learning system which will be looking for patterns in the data. Intuitively at least it seems to make sense to reduce the target model to that of local geology rather than that of the whole Earth.

The recorded magnitude of events (as in the Richter magnitude scale) is more or less a log10 of the event power. Compensations to this made by measuring stations to allow for their distance aren’t exactly straightforward. But seismic shaking is very roughly similar to sound, and the intensity of sound at a distance d from a point source is proportional to 1/d^2. This may be wildly different from the true function even under ideal conditions. But it does introduce the distance factor.

(If you’re interested in looking into the propagation side of this further, the ElastoDynamics Toolbox for Matlab looks promising).

Anyhow, I’ve roughed this out for node.js, code’s up on github. A bit of tweaking of the fields was needed (e.g. the magnitude appears in the text of the <title> element). It looks like it works ok, but I’ve only done limiting testing, shoving the output into a spreadsheet and making a chart. The machine I’m working on right now is old and has limited memory, so is a bit slow to do much of that.

elfish

The red dot is the home location.