Categories
2. You Should Know about Time Series Forecasting in R

Creating of a Box Plot by Cycle

boxplot(AirPassengers~cycle(AirPassengers, xlab=”Date”, ylab = “Passenger Numbers (1000’s)”, main = “Monthly air passengers boxplot from 1949-1960”))

Time_Series-11

From the above plot, you can see that the number of ticket sales goes higher in June, July, and August as compared to the other months of the years. 

  • Build the ARIMA Model Using auto.arima() Function

mymodel <- auto.arima(AirPassengers)

mymodel

Time_Series-12
  • Plot the Residuals

plot.ts(mymodel$residuals)

Time_Series-13
  • Forecast the Values for the Next 10 Years

myforecast <- forecast(mymodel, level=c(95), h=10*12)

plot(myforecast)

Time_Series-14
  • Validate the Model by Selecting Lag Values

Box.test(mymodel$resid, lag=5, type=”Ljung-Box”)

Time_Series-15

Box.test(mymodel$resid, lag=10, type=”Ljung-Box”)

time series16

Box.test(mymodel$resid, lag=15, type=”Ljung-Box”)

Time_Series-17

Looking at the lower p values, we can say that our model is relatively accurate, and we can conclude that from the ARIMA model, that the parameters (2, 1, 1) adequately fit the data. 

Categories
2. You Should Know about Time Series Forecasting in R

Load the Forecast Package into RStudio

install.packages(‘forecast’)

library(forecast)

  • Load the Air Passengers’ Dataset and View Its Class

data(“AirPassengers”)

class(AirPassengers

Here, ts represents that it’s a time series dataset.

  • Display the Dataset
Time_Series-3

Let’s check on our date values

start(AirPassengers)

[1] 1949    1

end(AirPassengers)

[1] 1960   12

So, our start date is January 1949, while the end date is December 1960.

  • Find out If There Are Any Missing Values

sum(is.na(AirPassengers))

[1] 0

  • Check the Summary of the Dataset

summary(AirPassengers)

/Time_Series-4
  • Plot the Dataset

plot(AirPassengers)

Time_Series-5
  • Decompose the Data Into Four Components

tsdata <- ts(AirPassengers, frequency = 12) 

ddata <- decompose(tsdata, “multiplicative”)

plot(ddata)

Time_Series-6.
  • Plot the Different Components Individually

plot(ddata$trend)

plot(ddata$seasonal)

plot(ddata$random)

Time_Series-7
  • Plot a Trendline on the Original Dataset

plot(AirPassengers)

abline(reg=lm(AirPassengers~time(AirPassengers)))

time series 10
Categories
2. You Should Know about Time Series Forecasting in R

Using of ARIMA Model for Time Series Forecasting

ARIMA models are classified by three factors:

p = Number of autoregressive terms (AR)

d = How many non-seasonal differences are needed to achieve stationarity (I)

q = Number of lagged forecast errors in the prediction equation (MA)

In this demo, we’ll use a dataset with information about air-ticket sales of the airline industry from 1949-1960. We’ll predict the Airline tickets’ sales of 1961 using the ARIMA model in R.

Time_Series-2.

The idea for this analysis is to identify the time series components which are:

  • Trend 
  • Seasonality
  • Random behavior of data

Then, we’ll forecast the values based on historical data.

Categories
2. You Should Know about Time Series Forecasting in R

Methods of Time Series Forecasting

  • ARIMA Model

ARIMA stands for Autoregressive Integrated Moving Average. It is a combination of the Autoregressive (AR) and Moving Average (MR) model. The AR model forecast corresponds to a linear combination of past values of the variable. The moving average model forecast corresponds to a linear combination of past forecast errors. The “I” represents the data values that are replaced by the difference between their values and the previous values.

  • SARIMA Model

SARIMA stands for Seasonal Autoregressive Integrated Moving Average. It extends the ARIMA model by adding a linear combination of seasonal past values and forecast errors.

  • VAR

The Vector Autoregression (VAR) method models the next step in each time series using an AR model. The VAR model is useful when you are interested in predicting multiple time series variables using a single model.

  • LSTM

The Long Short Term Memory network or LSTM is a special kind of recurrent neural network that deals with long-term dependencies. It can remember information from past data and is capable of learning order dependence in sequence prediction problems.

Categories
2. You Should Know about Time Series Forecasting in R

Components of Time Series

To use time-series data and develop a model, you need to understand the patterns in the data over time. These patterns are classified into four components, which are:

  • Trend

It represents the gradual change in the time series data. The trend pattern depicts long-term growth or decline.

  • Level

It refers to the baseline values for the series data if it were a straight line.

  • Seasonality

It represents the short-term patterns that occur within a single unit of time and repeats indefinitely.

  • Noise

It represents irregular variations and is purely random. These fluctuations are unforeseen, unpredictable, and cannot be explained by the model.

Categories
2. You Should Know about Time Series Forecasting in R

Applications of Time Series Forecasting

Time_Series
  • Time series forecasting is used in stock price prediction to predict the closing price of the stock on each given day.
  • E-Commerce and retail companies use forecasting to predict sales and units sold for different products.
  • Weather prediction is another application that can be done using time series forecasting.
  • It is used by government departments to predict a state’s population, at any particular region, or the nation as a whole.
Categories
2. You Should Know about Time Series Forecasting in R

Define Time Series Forecasting?

Time series forecasting is the method of exploring and analyzing time-series data recorded or collected over a set period of time. This technique is used to forecast values and make future predictions. Not all data that have time values or date values as its features can be considered as a time series data. Any data fit for time series forecasting should consist of observations over a regular, continuous interval.