I have some stocks data in a dataframe that I'm resampling, which results in some NaN values. Here's a section of the raw feed:
In [34]: feeddfOut[34]: open high low close volumedate2017-12-03 07:00:00 14.46 14.46 14.46 14.46 250002017-12-03 07:01:00 14.46 14.46 14.46 14.46 209172017-12-03 07:06:00 14.50 14.50 14.50 14.50 20002017-12-03 07:12:00 14.50 14.56 14.50 14.56 17000
The feed is supposed to be minute-by-minute, but when there's not data available, the row is skipped. When resampling the dataframe and aggregating for the opens, highs, lows, and closes, it looks like this:
In [35]: feeddf.resample('3Min').agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last'})Out[35]: open high low closedate2017-12-03 07:00:00 14.46 14.46 14.46 14.462017-12-03 07:03:00 NaN NaN NaN NaN2017-12-03 07:06:00 14.50 14.50 14.50 14.502017-12-03 07:09:00 NaN NaN NaN NaN2017-12-03 07:12:00 14.50 14.56 14.50 14.56
My question: I want to forward-fill the missing data based on the last row's close
value. df.fillna(method='ffill')
is not helping because it fills it based on the last value on the same column. Any idea?