Kaggle Earthquake Prediction Challenge

I nearly had kittens when danbri pointed this challenge out to me. I’ve thought for a long time earthquake prediction was well in scope for machine learning and have been dismayed at how little uptake there has been. (Surprised too, given the co-location of many tech people and the San Andreas fault…). Hopefully this competition will change things. Deadline is in 3 months, pretty significant $ prizes.

My second reaction was: “Great! I can work on ELFQuake and maybe win a prize!”. But that isn’t the whole picture. The competition is based around data generated in the lab, essentially by squashing a rock and recording its fractures. Apparently a reasonable approximation of geological effects.

I forget offhand where, but I’ve also seen a project aiming to use machine learning on real data. But that project, as with this competition, I feel is missing a trick. My gut says that although, sure, algorithms can almost certainly be useful in predicting seismic events, using the seismic data alone for training is a blinkered approach. These events, in the real world, occur as the result of the behaviour of a massively complex system. There are practical limitations on what can be modeled, but I’d suggest that it’s possible to creep a little further into the real world by bringing in data from other natural sources. For example, does the position of the moon influence the timing of events? Does seem at least credible, given that its gravity is enough to pull the oceans around.

The data source I reckon looks most promising is natural radio, signals that have been shown to sometimes contain artifacts associated with subsequent seismic events. This is the hypothesis of this project ELFQuake (ELF for Extremely Low Frequency, it’s in this frequency range and that of VLF, Very Low Frequency that earthquake precursors seem to occur).

For me, the Kaggle challenge has acted as a nudge, to get me moving on ELFQuake software again. What’s more, material has already appeared for the basic setup needed to process this data – data that has a lot in common with the ELFQuake targets. This is very convenient for me, as although I’m now getting somewhat familiar with the principles and algorithms of Deep Learning, my practical experience is virtually non-existent. So I’ve been given a great foot-up. Here’s some material using sklearn : video, github. The toolkit that seems to me the most appealing option  is Keras on TensorFlow (on Anaconda), but a lot of the pre/post Python wrangling will be the same.

I’ve put my name down for the competition, it’s a bit of extra motivation – potential $$$s! – to work on this stuff, and also what I get together for it can be used as a placeholder in the end-to-end system I’m aiming for (seismic & radio sensors -> data acquisition -> [magic] -> Twitter notifications).

Happy New Year!

A good time to take stock, huddled in front of the fire.

Boo!

As is often the case, I’ve been moving more slowly on this project than I’d have liked. Lack of resources is a continuing problem, but my own tendency to procrastinate has been by far the biggest obstacle to progress. On top of this, my main dev computer packed up recently, so until I can get that fixed or replaced I’m getting things set up again on an old laptop. Frustrating.

Three Steps Forward…

My strategy of taking a multi-pronged approach has had its pros and cons. I’ve got a prototype VLF receiver mostly built and have spent quite a lot of time playing around with Arduinos and related devices. On the software side – which is really the novel aspect of this project – I did make reasonable progress, getting together a provisional system design and some of the implementation. But then stalled. My desire to build hardware to allow local data collection has been something of a distraction, when there’s nothing stopping me from working with data from INGV and VLF.it.

Plans.

Looking ahead, I really need to reboot myself on the software dev. The ultimate target for running code will be nothing more sophisticated than this laptop. But for exploring algorithms and probably NN training, pre-optimisation, I reckon using cloud services will be my best bet. Concurrently I can look at some of the side prongs that I want to include in the system as a whole – notably web publication of data and automatic generation of Twitter notifications.

As everyone that’s worked on a solo project knows, I’ve also got a lot more material in my head or at best sketched in notebooks that needs writing up. How often has the New Year Resolution been : “Write more docs.”.

Mini-Seismograph

On the hardware side, until I’ve got my income a bit better sorted out I am pretty much limited to rather a scattergun approach using what ever components I have at hand. As well as finishing off the mostly-built VLF receiver, I’ve also got the bits for a basic seismograph. It’ll essentially be :

  •  ESP32microcontroller + comms :  core of the subsystem, handling the acquisition and preprocessing of data, which it will expose using a basic web server, accessible from the local network (the ESP32 includes WiFi connectivity).
  • MPU6050sensors : accelerometer +gyroscope : a tiny MEMS device, connected over I2C.
  • MicroSD carddata logging : experience shows that 100% connectivity is implausible, so some local history is very desirable
  • Tiny RTC cardrealtime clock :  the comms will be async, so accurate local timestamps are a must.

The ESP32 is a remarkably capable little device and I’m reasonably confident of the viability of interfacing the peripherals. Hopefully just a matter of plodding through example code for each, tweaking as needed.

The MPU6050 sensors are much less sensitive than those of typical seismometers. Only events with significant magnitude are likely to be detectable. It remains to be seen, but I have my suspicions that having 2 different types of sensor in there mean it will, with a bit of wrangling, be possible to get more effective sensitivity than the individual sensor data would yield. Whatever, once the wiring and code is in place for this setup, it should be trivial to extend it to use a more sensitive sensor. (Note the Raspberry Shake 4D configuration.)

Also…

I’ve got a little tangential project on the go. ELFQuake is in essence about trying to model aspects of a physical system : Earth geology and its electronically-detectable artifacts. Creating an analogue in software that captures enough to be able to make useful predictions. Also I’m increasingly convinced that the design of analog circuits between the sensors and standard data acquisition elements (ADCs etc) will have a major impact on the potential success of the system. Putting these points together, it shouldn’t seem that off the wall that I’ve been working on the design of an analog computer. (I must admit I also want to play with chaotic systems, this is something I’ve been messing about with for years).

 

 

Connecting all the World’s Circuits

I’ve been a bit frustrated in recent weeks by electronic circuit design tools. The typical process is to draw out the circuit schematic, run simulations and then generate/draw PCB layouts etc. Many of the tools (especially on *nix) use SPICE format to represent the circuit topology between the different operations.

The tools I’ve looked at so far all appear to have one major flaw or another.

To give just three examples:

  • gEDA – rather out-of-date, clunky UI
  • KiCad – the netlists it generates aren’t quite compatible with SPICE (circuit emulation) tools
  • Fritzing – the netlists it generate are nothing like SPICE format (I believe it uses XML)

So, the go-to representation as far as I’m concerned for pretty much anything is the Resource Description Framework (RDF). So I had a quick search around looking to see if anyone had looked at SPICE in RDF before. D’oh! I found a SPICE vocab I’d roughed out on GitHub around 2011. Jeez, my memory.

So it turns out that most of what I might have put in this post, I’ve already written up in Adding SPICE to the Semantic Web.

Just a couple of things to add here.

Why not use JSON? 

Since I did that post, JSON has become fairly ubiquitous, I’m sure it’s now most coders’ go-to representation of data. But in its basic form it isn’t Web-friendly, in the sense that it doesn’t natively support links.

Links could make things much easier to share and find: circuits, components, datasheets etc (the description of the circuit in RDF would include URLs for the components, which in turn could be associate with their characteristics, with their datasheets, etc etc).

There’s even a commercial angle. Given the list of components, a bill of materials can be generated. But typically nowadays you have to trawl through vendors to find suitable suppliers. But in RDF, the component could be associated with a vendor, with fields like the price etc. A distributed SPARQL query could figure much of this stuff out automatically.

Ok, why not use JSON? – There’s JSON-LD, which is an RDF representation, it’s JSON with links included.

One other idea. In the middle of typing this, I had a brief chat with Reto, told him what I was typing. He wondered whether there might be a role for inference (which is a good question, given the existence of RDF/OWL  reasoners). Hmmm, my immediate response was, yeah, maybe something like consistency-checking a circuit for dangling wires. But Reto made the point that OWL probably wouldn’t be the best reasoning for the job, this might be more of a SHACL use case.

 

Noise and Chaos on the Arduino

Off-topic. I needed to get my head into gear for work-work, and over the weekend I had an odd little idea I wanted to try. So here’s a quick & dirty write-up and video.

After playing with Arduino White Noise the other week, I did a bit of reading up on the Colors of Noise. Particularly interesting is Pink Noise, in which “each octave (halving/doubling in frequency) carries an equal amount of noise energy… This is in contrast with white noise which has equal intensity per frequency interval.”. It occurs a lot in nature, but is not entirely trivial to synthesize either using analog or digital processing. (Here’s a fairly accurate analog pink noise generator circuit).

Mind wandering, this led me onto chaotic signals. These are remarkably easy to slip into in the analog domain, essentially all you need is a non-linear system with feedback (and the right parameters)  – see this old magazine write-up on non-linear circuits. They also easy to generate in the digital. The best known system is probably the Lorenz Attractor,

But there are much simpler discrete systems, notable the Logistic Map. This is just:

x1 = r * x0 * (1-x0)

where r is a constant, x0 is the current value of x, x1 the next value. With values of r between about 3.6 and 4, the thing goes chaotic.

This was pretty easy to plug into the same skeleton code I used for Arduino white noise generation. The result was the same distinctive kind of racket that the analog circuits generate.  To provide a bit of control, I put a pot. on an analog input, scaling the read value between 0-1 and adding it to 3 to provide an interesting value range for r.

But what I wanted to play with wasn’t just this. One way of generating electronic (and mechanic) chaos is to drive an otherwise periodic system with a periodic signal, as in the chaotic double pendulum. But with pink noise on my mind, I was curious to see what would happen if a chaotic system was driven with white noise.

The code, again using the skeleton I already had, was straightforward. I added another pot. to another analog input to determine the level of the noise signal.

My code is a real hacky mess at the moment, mostly due to hopping between integer and float values, and scaling, but the core of it looks like this (effectively inside a loop):

  // Shift register-based random number generator (white noise)
  unsigned lsb = lfsr & 1;   /* Get LSB (i.e., the output bit). */
  lfsr >>= 1;                /* Shift register */
  lfsr ^= (-lsb) & 0xB400u;

  // control values
  noise_level = analogRead(NOISE_LEVEL_PIN); // will be 0 - 1023
  r_value = analogRead(R_VALUE_PIN);

  r = 3 + ((float)r_value) / 1024;

  noise_scale = ((float)noise_level) / 2048;

  x_scale = 1 - noise_scale;

  noise = noise_scale * ((float)lfsr) / 65536;

  x = x_scale * x + noise;

  // logistic map
  x = r * x * (1 - x);


  // the value to output
  temp3 = (uint16_t)(x * 65536); // scale & cast

I’ve no idea where this is going…

Screenshot from 2018-09-04 14-00-45

VLF Receiver Oddments

I’m doing a little more on a simple handheld VLF receiver I’ve been working on. For an electric field receiver all that’s essentially required is a whip antenna and a high input impedance, high gain, audio frequency amplifier. Some filtering is desirable to limit the bandwidth and cut the noise of mains hum.

I’ve already soldered up the input & filter stages, yesterday breadboarded the output stage – an amplifier to drive a little speaker/headphones. But I’d forgotten a key consideration, how much overall gain the thing should have.

A quick google later, found this rather nice poster on NASA’s site, “Building and Testing a Portable VLF Receiver“.

Screenshot from 2018-08-28 18-54-27

It doesn’t have the schematic – I expect it’s one of their INSPIRE models. But it does have what I was looking for. Signal is of the order of microvolts, their overall gain is x1500 – rather more than I’ve allowed for so far. There’s a feedback resistor change in my near future.

First though I reckon I’ll draw up the circuit as it stands (in KiCAD). I can figure out the gain bits from there, and simulate. I also need to check roll-off at the frequency extremes (call it 20Hz & 20kHz). Hmm, gain of 1500, that’ll be tempting to stability problems.

When I was looking for the gain requirements yesterday, I opened a bunch of the results in browser tabs. I found what I was looking for in the first, but am pleased I didn’t close the others. While vlf.it is the site for all things Radio Nature, I did stumble on some material I hadn’t seen before.

This page is notable : VLF Natural Radio Reception at techlib.com. It features a variety of simple receiver designs. One piece of utter genius jumped out at me. The major problem with VLF reception is interference noise, so ideally you want to situate the receiver a long way from sources of that – eg. houses, computers… Which is a pain if you want to record/analyze the signal. Here the author bends a baby monitor transmitter, replacing the mic with a VLF preamp. Voila, instant remote receiver.

I love this :

The antenna was horizontal and near the ground under my truck for this recording. That turned out to be a questionable location, by the way! Not only did several neighbors become alarmed by it, but a couple of police officers also spotted the thing. I must admit, it does have a bomb-like appearance! It spent the rest of the night under an overturned flower pot with the VLF antenna sticking out the little drain hole in the bottom.“.

Another very promising site I ran across, have still to read, is Larry’s Very Low Frequency Site. Looks like there’s some good material.

For now, back to the KiCAD.

 

Matching Transistors for Log/Exp Converters

Slightly off-topic again.

I’ve been looking at analog log/exp converters, primarily with music synth applications in mind. Here’s a typical Voltage Controlled Oscillator circuit, which uses a pair of transistors as part of the exponential conversion sub-circuit.  But there may well be potential for using an analog log converter to effectively improve the resolution of the ADC part of a seismic data acquisition system. Note that earthquake magnitude measurements are usually expressed as log values – e.g. in the Richter Scale, a magnitude 5 event has an amplitude 10x that of a magnitude 4 event.

There’s a useful selection of general-purpose log & exp converters in TI Application Note AN-30. When building such circuits from op amps + transistors, there are two factors that can significantly affect accuracy. The first is the effect of temperature on transistor characteristics. This is usually offset by using a temperature-sensitive (‘tempco‘) resistor. I don’t currently have any of these… The second issue is that the circuits generally involve a pair of transistors in a balanced configuration. Here it’s useful to select transistor with closely matched characteristics.

Screenshot from 2018-08-14 19-01-37

The classic circuit for testing for matching was given by none other than Dr. Robert Moog:

Screenshot from 2018-08-14 19-02-59

More sophisticated variations are described at Music from Outer Space. I’ve got a bag of 100 2N3904 transistors (about €2 from China), so I decided to have a go at finding some matched pairs.

My circuit began with a silly mistake. I’d misread Moog’s circuit, thinking that both test points were floating, not noticing that one was ground. I only realised once I’d got the thing breadboarded. No big deal, and buffering both lines did offer a bit more scope for experimentation. This is what I ended up with:

Screenshot from 2018-08-14 17-56-28

I used KiCAD for the diagram, files are on github.

The left-hand side is the same as Moog’s, just with a better op amp and 1% resistors. The right-hand side is a basic instrumentation amplifier consisting of a couple of unity-gain buffers feeding a differential amplifier with gain of 10. I initially tried a gain of 100 (using 220k rather than 22k around U1C), with a bias voltage (from a pot) on pin 5 of U1B, but this turned out to be over-sensitive, it was too easy to flip the output to one rail of the other.

I didn’t see much point in accurate reference voltages as in the MFOS designs, my 12v is regulated and after I’d left everything connected for a little while, there was too much variation in individual measurements.

To do mass comparisons while avoiding touching the transistors (and warming them up), I stuck 40 of them into a breadboard:

DSCN1955.JPG

Moog refers to Vbe values of around 0.6V, and a target of matching within 2mV. I got similar values, 0.573 +/- 0.001V with only a couple of exceptions (even then less than 3mV difference). This seemed a little too good to be true, so I played around with things like changing the bias voltage, but still the values did seem surprising closely matched. Then a simple sanity check occurred to me. Putting a BC109 under test, this gave a value of 0.553V. Not matched to the 2N3904s.

So it looks like I got lucky 🙂