Seismic Data – fixed?

As described in my last post, I was seeing significant gaps in the seismic event data I was retrieving from the INGV service. So I re-read their docs. Silly me, I’d missed the option to include query arguments restricting the geo area of the events (I had code in a subsequent script doing this).

While tweaking the code to cover these parameters I also spotted a really clumsy mistake. I had a function doing more or less this –

for each event element in XML DOM:
        extract event data
        add event data to list
        return list

D’oh! Should have been –

for each event element in XML DOM:
        extract event data
        add event data to list
return list

I’ve also improved error handling considerably, discriminating between genuine HTTP errors and HTTP 204 No Content. Now I’ve narrowed the geo area and reduced the time window for each GET down to 1 hour, there are quite a lot of 204s.

I’m now running it over the time period around the l’Aquila quakes as a sanity check. Jeez, 20+ events in some hours, 10+ in most.

Assuming this works ok, I’ll run it over the whole 1997-2017 preiod, hopefully in ~12 hours time I’ll have some usable data.

PS. Looking good, for the 30 days following that of the l’Aquila big one, it produced:

in_zone_count = 8877
max_depth = 62800.0
max_magnitude = 6.1

 

 

 

Seismic Data Wrangling

Following my interim plan of proceeding software-only (until I’ve the funds to get back to playing with hardware), I’ve been looking at getting seismic event data from the INGV Web Service into a Keras/Tensorflow implementation of PredNet.

My code is on GitHub, and rather than linking to individual files which I may rename, I’ll put a README over there with pointers.

As a first step, I put together code to pull of the data and dump it down to simple CSV files. This appeared to be working. The demo implementation of PredNet takes HDF5 data from the KITTI  vision dataset (videos from a car on road around Karlsruhe), extracting it into numpy arrays, with the PredNet engine using Keras. To keep things simple I wanted to follow the same approach. I’m totally new to HDF5 so pinged Bill Lotter of the PredNet project for clues. He kindly gave me some helpful tips, and concurred with what I’d been thinking – keep the CSV data, process that into something PredNet can consume.

The data offered by the Web Service is good XML delivered over HTTP (props to INGV). But it does include a lot of material (provenance, measurement accuracy etc) that isn’t needed here. So my service-to-CSV code parses out just the relevant parts, producing a line for each event:

datetime, latitude, longitude, depth, magnitude

e.g.

2007-01-02T05:28:38.870000, 43.612, 12.493, 7700, 1.7

I couldn’t find the info anywhere, but it appears that the INGV service records go back at least to somewhere in the early 1990’s, so I chose 1997-01-01T00:00:00 as a convenient start datetime, giving me 20 years of events.

For this to be a similar shape to what PredNet expects, I will aggregate events within a particular time period (actually I think taking the most significant event in that period). I reckon 6 hour periods should be about right. This also seemed a reasonable window for calling the service (not). I’ll filter down the events to just those within the region of interest (northern Italy, see earlier post)  and then scale the latitude & longitude to an easy integer range (probably [128, 128]). For a first pass I’ll ignore the depth field.

As it happens, I’m well on the way to having implemented this. But along the way I did a few sanity checks, eg. checking for maximum event magnitude in the region of interest, (I got 4.1), and it turned out I was missing some rather significant data points.  Here’s one I checked for:

The 2009 L’Aquila earthquake occurred in the region of Abruzzo, in central Italy. The main shock occurred at 03:32 CEST (01:32 UTC) on 6 April 2009, and was rated 5.8 or 5.9 on the Richter magnitude scale and 6.3 on the moment magnitude scale; its epicentre was near L’Aquila, the capital of Abruzzo, which together with surrounding villages suffered most damage.

Nope, it wasn’t in the CSV, but the Web Service knows all about it:

http://webservices.ingv.it/fdsnws/event/1/query?eventId=1895389

Doing a service call for that whole day:

http://webservices.ingv.it/fdsnws/event/1/query?starttime=2009-04-06T00:00:00&endtime=2009-04-06T23:59:59

–  yields 877 events – nightmare day!

I’d set the timeout on the HTTP calls to 2 seconds, but there is so much data associated with each event that this was woefully inadequate. Since upped to 5 mins.

Manually checking calls, I was also sometimes getting a HTTP Status Code of 413 Request Entity Too Large. This puzzled me mightily – still does actually. It says request entity, not requested (or response) entity, but the way it’s behaving is that the response requested is too large. Either way I reckon the spec (latest is RFC7231) is a little open to misinterpretation here. (What the heck – I’ve mailed the IEFT HTTP list about it – heh, well well, I’ve co-chaired something with the chair…).

Anyhow, I’ve also tweaked the code to make calls over just 1 hour windows, hopefully it’ll now get the stuff it was missing.

Hmm…I’ve got it running now and it’s giving errors throughout the year 2000, which should be trouble-free. I think I’ll have to have it make several passes/retries to ensure I get the maximum data available.

Drat! It’s giving me Entity Too Large with just 1 hour windows, e.g.

http://webservices.ingv.it/fdsnws/event/1/query?starttime=2000-12-13T01:00:00&endtime=2000-12-13T02:00:00

I need to fix this…