All Writing
Engineering Strategy

Exoplanet Signal Extraction: From Raw Sensor Cubes to Smoothed Spectra

How a calibration-first pipeline and moving-average smoothing stabilized a silver-medal Kaggle solution.

October 31, 20245 min read
KaggleAstronomical Signal ProcessingSpectral DenoisingTime SeriesNoise Reduction
01 · Summary

Reflections on a silver-medal Kaggle solution for the Ariel Data Challenge 2024 — a lightweight signal-processing pipeline where moving-average smoothing mattered more than a larger model.

ARTICLE SUMMARY

This route is currently preserving the writing-detail structure from the original frontend baseline. The long-form body has been condensed for the current Writing merge, while the article metadata, summary, related links, and navigation path remain active.

Reflections on a silver-medal Kaggle solution for the Ariel Data Challenge 2024 — a lightweight signal-processing pipeline where moving-average smoothing mattered more than a larger model.

What this piece covers

The preprocessing pipeline, the transit-depth fit, and why smoothing across nearby wavelengths reduced noisy spectral estimates.

Current state

Silver-medal Kaggle competition note based on a lightweight code pipeline; the team placed 57th out of 1,151 teams in the NeurIPS Ariel Data Challenge 2024.

02 · How I think
CONTENT

This competition looks like a multimodal machine-learning problem, but my solution ended up being much more about signal engineering than model size. The raw inputs are large sensor cubes from FGS1 and AIRS-CH0, yet the first useful step was not to build a bigger learner. It was to calibrate hard, remove obvious nuisance variation, and compress the data into something stable enough that the transit signal could still be seen.

The pipeline does that aggressively. It applies gain and offset correction, linearity correction, dark subtraction, hot and dead pixel masking, flat-fielding, wavelength cropping, center-pixel selection, correlated double sampling, and time binning. After spatial averaging, both sensors are reduced to a compact time-by-wavelength representation. That reduction is the practical foundation of the whole approach: it turns a heavy 3D denoising problem into a much lighter 2D spectral-time problem.

The spectrum estimate itself stays deliberately simple. First detect the transit window from an averaged light curve. Then solve for a scalar depth that makes the in-transit step disappear as much as possible after fitting a low-order polynomial baseline. The optimizer is almost secondary here — it is just a one-dimensional fit. The important improvement was refusing to trust each wavelength independently. Nearby wavelengths share structure, so I estimated depth using local wavelength windows and then applied a moving-average smoothing step to stabilize the final spectrum. That was the main optimization in practice: fewer spiky channel-wise estimates, less noise, and a cleaner submission.

The rest of the pipeline follows the same philosophy. A small wavelength-wise scaling step corrects systematic bias seen on the training set, and the uncertainty prediction stays lightweight rather than heavily modeled. The broader lesson is simple: in noisy scientific competitions, a compact pipeline can outperform a more ambitious model if the denoising choices are made in the right place. Here, the silver-medal result came less from complexity and more from respecting the measurement process, simplifying early, and smoothing exactly where the noise was hurting most.

Core Tension

A detailed per-wavelength estimate is not automatically a stable one.

The main tradeoff here was between local spectral detail and robust denoising. With very faint transit signals, letting nearby wavelengths support each other was more reliable than treating every channel as fully independent.

Research Shift

Smooth the spectrum before making the model more complicated.

The practical move was to stop overfitting wavelength-level noise and instead use local window averages plus a final moving-average pass to stabilize the spectrum.