Modelling & Forecasting Time-Series data has been one of the cornerstones of Predictive Analytics in the era of Big Data. There are a plethora of forecasting techniques available today whose context can be a pain to understand and as we know, in the war on noise, context serves as a crucial ammunition. To that end, we, Hemanth Sindhanuru & Srinidhi K, from LatentView Analytics are presenting this series of articles where we will be discussing a structured methodology to understand, analyse & forecast time-series data
The previous articles have presented a practical, if not purely statistical, perspective of the bases upon which some of the popular time-series modelling techniques have been formulated. In continuation to that discussion, the current article explores the attributes of a time-series, some of which serve as inputs to our modelling approaches whilst the others serve as criteria for the selection of appropriate models.
One such significant attribute of a time-series is its Periodicity which is pre-requisite in modelling the seasonal & cyclic components of the time-series.
DEFINING THE PERIODICITY ATTRIBUTE
In simple terms, Periodicity of a time-series can be defined as the no. of data-points or observations in a single season. Consider the following cases;
Here is a summary of some common periods in time-series: .
However, most of the time series available from the datasets in different domains do not always come with known frequency or regular periodicity. In such cases, we need to identify any intrinsic periodic patterns in the data. There are certain tools which help us in this exact context; identifying any inherent periodicity of a time-series. These tools can be segregated into two domains based on their functional nature; analyses of the Time Domain & the Frequency Domain
TIME DOMAIN ANALYSIS
Time-domain analysis of a time-series involves observing the variation of amplitude of the time-signal with respect to time, hence the name Time-Domain analysis. An example of such a tool is the Auto-Correlation Function (ACF) function.
The Auto Correlation Function calculates the correlation of a time-series observation with its lagged values in the time-series i.e. cross-correlation of an observation with the values preceding it in the time-series.
The ACF chart displayed above, plots the auto-correlation coefficient (output of the ACF) vs the different lags. A seasonal pattern would result in a large autocorrelation coefficient at the seasonal lag (lag = periodicity). In the above example, the ACF plot shows significantly large auto-correlation coefficients at lags 11, 22, 33 and so on, from which we can infer the time-series to the left has a seasonal nature with a periodicity attribute of 11.
FREQUENCY DOMAIN (SPECTRAL) ANALYSIS
Understanding the nature of Frequency Domain analysis
Frequency Domain analysis of a time-series involves studying the variation of the amplitude of a time series with respect to frequency or wavelength. To get a better context on how the analysis of the Frequency Domain differs from that of the Time Domain, let us recollect a concept mentioned in the previous articles. It was discussed that a time-series or signal can be interpreted as a combination of sinusoidal components with different frequencies. Frequency Domain analysis helps us deduce the frequencies of these component sinusoids.
Periodogram: a tool for Frequency Domain Analysis
One of the tools we use for spectral analysis is a graph named Periodogram. The entities plotted on the Y & X axes are Spectral Density & Frequency respectively. Fundamentally, a Periodogram plots Variance per unit frequency (Y axis) for a range of possible frequencies (X Axis).
• The area under the curve bounded by two frequencies on the X-Axis represents the variance in that specific frequency band.
• A peak on the graph or spectrum, represents relatively high variance in the frequency band centred on the peak. Hence, frequency corresponding to a local maxima of the Spectral Density in the Periodogram can be interpreted as a dominant seasonal cycle in the time-series.
The Periodogram for the time-series considered previously for the ACF plot is displayed above. The Periodogram plot shows us two local maxima, one at frequency 0.091 & other at frequency 0.01. The periodicity can be computed from the vintage relation between frequency & periodicity in Physics;
The Periodogram shown above implies the time-series under consideration has multiple seasonal cycles, one with a frequency of 11 & the other with a frequency of 100, which is in agreement to the inference from the ACF plot as well as the time-series plot.
Smoothening the Periodogram.
It should be noted that the Periodogram is a discrete function, i.e. it calculates spectral densities on the Y-Axis for finite & particular values of Frequency on the X-Axis. Sometimes due to a high level of noise in the time-series, the raw Periodogram plot doesn’t show any clear maxima outright. In such cases, we can smoothen the periodogram so as to dampen the variance of the data and clearly identify a significant peak.
However, the smoothing comes at the cost of accuracy. In datasets with very strong inherent seasonal variations, the dampening wouldn’t affect the significant peak, but in time-series with a low degree of seasonality, the smoothing might result in over-dampening, resulting in a decrease in precision on the location of the significant peak.
Keep watching this space for more