Time-Series Forecasting with RNNs

Predicting daily climate data using sequence modeling

What is a Time-Series RNN?

Most ML models treat each row as independent. An RNN does not. It keeps a running hidden state that gets updated at each step, so earlier inputs carry forward and influence later outputs. That internal memory is what makes sequence modeling work.

For time-series data this is a natural fit. Yesterday's temperature carries information about tomorrow's. A sliding window of recent days feeds into the network step by step, and the hidden state accumulates context across the whole window before producing a prediction.

The vanilla RNN's memory has real limits though. As sequences grow longer, gradients shrink exponentially during backprop through time — the earliest timesteps stop contributing meaningfully. This is the vanishing gradient problem. LSTMs and GRUs address it using gating mechanisms that selectively retain or discard information across long distances. This project uses the basic RNN on purpose: the 7-day window is short enough that gradient flow stays reasonable, and stripping out the gating lets you see the core recurrent mechanics directly without extra complexity.

About This Project

The dataset is Daily Delhi Climate, covering several years of daily weather readings in Delhi. The target variable is meantemp — the mean temperature on the following day. A 7-day sliding window is built for each sample, so the model sees a week of readings before predicting the next.

Preprocessing covers four steps: temporal features are extracted from the date column (month and day-of-year, since climate has strong seasonality), all inputs are Min-Max scaled to [0, 1], data is reshaped into sequential blocks of shape (samples, 7, features), and finally the RNN is trained on those sequences. Each sample is a 7-step sequence; the label is the meantemp on day 8.

The RNN uses a single hidden layer with 64 units, outputting a single scalar. Training uses MSE loss with Adam. Validation loss plateaus within roughly the first 20 epochs, which is typical for a short input sequence on a smooth climate signal. The main predictive signal is concentrated in the most recent few days, so the 7-day window already captures most of what is available. Adding more depth or a wider window does not meaningfully improve results on this dataset.

See it in Action

Check out the full implementation, dataset processing, and model training in my Kaggle notebook.