Following my interim plan of proceeding software-only (until I’ve the funds to get back to playing with hardware), I’ve been looking at getting seismic event data from the INGV Web Service into a Keras/Tensorflow implementation of PredNet.
My code is on GitHub, and rather than linking to individual files which I may rename, I’ll put a README over there with pointers.
As a first step, I put together code to pull of the data and dump it down to simple CSV files. This appeared to be working. The demo implementation of PredNet takes HDF5 data from the KITTI vision dataset (videos from a car on road around Karlsruhe), extracting it into numpy arrays, with the PredNet engine using Keras. To keep things simple I wanted to follow the same approach. I’m totally new to HDF5 so pinged Bill Lotter of the PredNet project for clues. He kindly gave me some helpful tips, and concurred with what I’d been thinking – keep the CSV data, process that into something PredNet can consume.
The data offered by the Web Service is good XML delivered over HTTP (props to INGV). But it does include a lot of material (provenance, measurement accuracy etc) that isn’t needed here. So my service-to-CSV code parses out just the relevant parts, producing a line for each event:
datetime, latitude, longitude, depth, magnitude e.g. 2007-01-02T05:28:38.870000, 43.612, 12.493, 7700, 1.7
I couldn’t find the info anywhere, but it appears that the INGV service records go back at least to somewhere in the early 1990’s, so I chose 1997-01-01T00:00:00 as a convenient start datetime, giving me 20 years of events.
For this to be a similar shape to what PredNet expects, I will aggregate events within a particular time period (actually I think taking the most significant event in that period). I reckon 6 hour periods should be about right. This also seemed a reasonable window for calling the service (not). I’ll filter down the events to just those within the region of interest (northern Italy, see earlier post) and then scale the latitude & longitude to an easy integer range (probably [128, 128]). For a first pass I’ll ignore the depth field.
As it happens, I’m well on the way to having implemented this. But along the way I did a few sanity checks, eg. checking for maximum event magnitude in the region of interest, (I got 4.1), and it turned out I was missing some rather significant data points. Here’s one I checked for:
The 2009 L’Aquila earthquake occurred in the region of Abruzzo, in central Italy. The main shock occurred at 03:32 CEST (01:32 UTC) on 6 April 2009, and was rated 5.8 or 5.9 on the Richter magnitude scale and 6.3 on the moment magnitude scale; its epicentre was near L’Aquila, the capital of Abruzzo, which together with surrounding villages suffered most damage.
Nope, it wasn’t in the CSV, but the Web Service knows all about it:
Doing a service call for that whole day:
– yields 877 events – nightmare day!
I’d set the timeout on the HTTP calls to 2 seconds, but there is so much data associated with each event that this was woefully inadequate. Since upped to 5 mins.
Manually checking calls, I was also sometimes getting a HTTP Status Code of 413 Request Entity Too Large. This puzzled me mightily – still does actually. It says request entity, not requested (or response) entity, but the way it’s behaving is that the response requested is too large. Either way I reckon the spec (latest is RFC7231) is a little open to misinterpretation here. (What the heck – I’ve mailed the IEFT HTTP list about it – heh, well well, I’ve co-chaired something with the chair…).
Anyhow, I’ve also tweaked the code to make calls over just 1 hour windows, hopefully it’ll now get the stuff it was missing.
Hmm…I’ve got it running now and it’s giving errors throughout the year 2000, which should be trouble-free. I think I’ll have to have it make several passes/retries to ensure I get the maximum data available.
Drat! It’s giving me Entity Too Large with just 1 hour windows, e.g.
I need to fix this…