I wanted to correlate some unusual data to something that could generate returns, so I went to the NASA earthdata website and went browsing.
On the EarthData website, I found a lot of compelling datasets. Agricultural signals seemed like they would have a solid link to market prices. I found the 16-day MODIS NDVI/EVI dataset with a 250m resolution (example below) and downloaded 20 years of data for north-eastern South Africa. South Africa's markets seemed developed and stable, but inefficient enough to turn a reasonable profit.
I thought I better not choose the more commonly analysed Europe and North America out of fear of competing with Jane Street and friends.
It took quite a long time to create effective features. I found the 16 day mean for both NDVI and EVI, then wrote a script to abtract these elements (rolling mean, median, lagged windows, etc.). I ended up with 168 features and an absurd VIF value for many of them, the top 10 were infinite. I found most of them had a reasonable correlation with price so I manually tried combinations of them to find a sweet spot. Once I had the sweet spot, I did a grid search of hyperparameters, then again with combinations of features.
Up until this point, I had been using an LSTM model. I sent some of my results to my friend James who was very kind in letting me know (in a qualified and solicited manner) that I had severe overfitting. This was news to me, and warranted a nap.
In an attempt to mitigate the overfitting (from what I assumed were dodgy features), I started using a ridge regression model. I had read online that ridge regression automatically weights features that are highly correlated to each other appropriately. I know, there is still a lot of work to do to make this a viable trading strategy, and I will validate it further in the future, but I had fun developing it to this level. Here are some (interactive) results from my ridge model. The R2 score is inflated here as it uses a particularly lucky train, val, test split. Average R2 scores were a bit lower at 0.55.