ETHEREUM PRICE
PREDICTION
Robert Kozub
University of Texas at Dallas
Knowledge Mining
S p r i n g 2 0 2 2
USING MACHINE
LEARNING METHODS
Table of Contents
KO ZU B
20 22
Introduction
01
Methods/
Sample Data
02
Data Exploration
03
Linear Regression
05
Prophet Model
09
Conclusion
11
References
12
Contact
13
The Ethereum network uses ETH to pay for work done on the blockchain (Investopedia, 2022). A
blockchain is a distributed database that is shared on a peer-to-peer network. No changes can be
made to any "block" without a consensus from all users, which makes it more secure for
transactions. Ethereum was created to be decentralized and is one of the most preferred
blockchains by developers and enterprises. The token, ETH, was made to be used in the
blockchain network, but can also be used as a currency.
Since the creation of Ethereum in 2013 by Vitalik Buterin, the cryptocurrency has skyrocketed in
price from $1.25 to around $2,500 today
(May 2022). The all-time high for ETH
was approximately $4,400 in November
2021. Cryptocurrency as a whole is
extremely volatile, even more so than the
stock market. However, many analyst are
able to make predictions and forecasts
regarding certain stocks depending on
factors such as market health and company
success. But is there a way to predict or
make forecasts on crypto? More specifically,
is there a way to know if we can predict
just how much Ethereum is going to fluctuate in the future? And if so, what methods are
appropriate for predicting ETH prices in the near future?
This paper aims to research Machine Learning methods (Linear Regression and Forecasting with
Prophet) using R (RStudio) and determine if ETH prices can be predicted within a certain degree
of accuracy. The end goal of the models produced is not for actual investment and should not be
used for such purposes. This exploration is purely academic with a goal to see if the Linear
Regression and Prophet models can project the same trends as the actual prices. If successful,
further studies may be conducted to compare trends of ETH with external factors such as the
stock market, inflation, and other cryptocurrencies.
Introduction
Ethereum (ether) or ETH, is a virtual machine available around the world
powered by blockchain technology.
Figure 1: Blockchain Chart (Investopedia)
1
Linear
Regression
One of the most common statistics and
Machine Learning methods, Linear
Regression seeks to understand the
relationship between input and output
variables (numerical).
Forecasting with
Prophet
Date
Open
High
Low
Close
Adj Close
Volume
4/30/2021
2757.734131
2796.054932
2728.169922
2773.207031
2773.207031
29777179889
5/1/2021
2772.838379
2951.440918
2755.908447
2945.892822
2945.892822
28726205272
5/2/2021
2945.560059
2984.891846
2860.526123
2952.056152
2952.056152
28032013047
5/3/2021
2951.175781
3450.037842
2951.175781
3431.086182
3431.086182
49174290212
5/4/2021
3431.131592
3523.585938
3180.742676
3253.629395
3253.629395
62402045158
5/5/2021
3240.554688
3541.462646
3213.101563
3522.783203
3522.783203
48334198383
5/6/2021
3524.930908
3598.895996
3386.23999
3490.880371
3490.880371
44300394788
5/7/2021
3490.105225
3573.290039
3370.261963
3484.729004
3484.729004
39607240515
5/8/2021
3481.988037
3950.165039
3453.768555
3902.647705
3902.647705
50208491286
5/9/2021
3911.463135
3981.259033
3743.989014
3928.844727
3928.844727
50568290278
The Prophet package in R was produced as
a Machine Learning model for time series
data by Facebook. The model is open-
source and available for anyone to use to
accurately and automatically forecast data.
The following methods were used to create two difference algorithms with the goal of
accurately predicting Ethereum prices using one year of historic closing price data.
Machine Learning is a from of
Artificial Intelligence (AI) that uses
data and algorithms to mimic
human learning, aiming to
improve in accuracy (IBM, 2020).
Methods
KO ZU B
20 22
Sample Data
Table 1: Sample Data from "ETH-USD.csv" 4/20/2021-4/30/2022 (Yahoo Finance)
2
Data Exploration
Figure 2 shows the trend of Ethereum (ETH) closing prices from April 2021 to April 2022. The
cryptocurrency has gone through a rollercoaster in the last year, partially due to the COVID-19
pandemic and its effect on the entire world, including crypto. However, 2021 was a particularly
good year for ETH, climbing from below $2,000 in July to over $5,000 USD in November 2021.
Currently, there seems to be signs of a downward trend, but hopefully this analysis will clarify
whether this will continue or there will be a recovery.
In Figure 3, there is a negligible difference between the opening and closing prices and share
almost an identical mean:
> mean(data$Close)
[1] 3186.631
> mean(data$Open)
[1] 3187.535
When evaluating the data itself, there are days where the opening and closing prices vary
greatly, but when averaged, end up being extremely similar. The same was relayed for all
summary statistics between the opening and closing prices.
KO ZU B
20 22
3
Figure 2: Ethereum Closing Prices (4/30/2021-4/30/2022) Figure 3: Ethereum Open vs. Closing Prices
> summary(data$Open)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1786 2636 3135 3188 3740 4810
> summary(data$Close)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1788 2635 3135 3187 3738 4812
Figure 4 shows the frequency of Volume (sum of total trades taking place), which appears to be
skewed to the right, but shows a somewhat normal distribution. The peak frequency is around
20B and has gotten to over 80B on the higher end. Figure 5 relays the correlation between
closing prices and the transaction volume, show that the crypto has been at it's highest price
when a lower number of transactions were happening. However, when the volume is at it's
highest, the price of ETH rarely drops below $2,500.
KO ZU B
20 22
4
Figure 4: Frequency of ETH Volume Figure 5: ETH Closing Price vs. Volume
Figure 6: Correlation Between All ETH Variables
Figure 5 shows the correlation between
all variables within the dataset. They all
appear to be highly correlated, which may
cause a problem with an overfitted model
later on. This would make sense
considering all variables are directly
related to the price of ETH and the
dataset does not include any outside
factors such as stock prices or other
crypto prices and volume. Therefor, we
can assume that each of these variables
will be a good fit for the regression model.
However, one variable, the volume, appears to have a negative correlation with the closing
price, adjusted close, and low. This negative correlations is very small and may not have any
affect whatsoever on the closing price consider all of these values are smaller than 0.01.
Linear Regression
The main method of prediction will be with Linear Regression. Linear Regression is a Machine
Learning model that finds a linear fit/relationship or line when comparing an independent and
dependent variable (Deepanshi, 2021). For this model, the variables that will be compared are
the Open and Closing prices in regards to the Date, technically making this a Multiple Linear
Regression model.
KO ZU B
20 22
5
To being, random "seeds" will be generated using the function set.seed(). This allows for random
numbers or pseudorandom numbers to be generated using an algorithm. The point of this
function is to sample without reproducing the same number twice.
Random Seed
Next, a training and testing dataset are created and put into data frames using our random seeds.
This ensures there are two different set of dates to sample from when running our regression
model and making predictions.
Training and Testing Datasets
This is where the training dataset gets put to work and the regression model, using the lm()
function trains the dataset in order to calculate trends. For this regression model, the opening and
closing prices were used to compare with each other in order to train the model to be ready to
input into the prediction model.
Train Model
Figure 7: Linear Regression (via analyticsvidhya.com)
KO ZU B
20 22
6
Next, the residuals are plotted via histogram. Since the residuals seem to take shape of a
normal distribution, this can conclude that the variables are a good fit for the model and
therefor can proceed with the next steps.
Plot Residuals
Figure 8: Plotted Residuals
Prediction
Using the regression model and the testing model, the results of those two analysis are
used for the prediction which is performed using the predict() function in R. This function
predicts values based upon the input data (the regression and testing model). The output
of the prediction model is then combined into a new data frame with the actual values
from our original dataset for comparison along with the dates for each instance. That
data frame is then melted (using the function melt() ) so that the data can be visualized.
Linear Regression Prediction Results
Finally, the results of the Linear Regression model and predictions are plotted for comparison to see
how the model did compared to the actual data. The data was visualized in Excel since the R's output
was unable to show the minute differences in the predicted vs the actual values. Seen in Figure 5, the
predicted values (orange) match up very well with that of the actual data (green).
KO ZU B
20 22
Figure 9: Predicted Values vs. Actual
Since the model was highly overfitted, this result, while extremely promising, may be not be as accurate
as it leads on. However, these are great results and serves as a great foundation for understanding how
the price of ETH can potentially be forecasted in the future similar to stocks in the stock market. There is
a slight difference that is interesting however, as the predicted values (orange) start to show an upward
trend towards the end of April 2022, when the actual values (green) show a downward trend,
suggesting that ETH may be due for a turnaround after it's last downturn over the course of April 2022.
To confirm this, data from May 2022 should be evaluated in the future to see if the the model correctly
projected a positive trend or not.
7
MSE
18205.93
RMSE
134.9294
The Mean Squared Error (MSE) is calculated by taking the average of the square of the difference
between the actual values and predicted. The Root Mean Squared Error (RMSE) is the standard
deviation of the errors which occur when a prediction is made. By taking the root of the value,
this determines the accuracy of the model (SmritiS, 2021). The calculated RMSE turned out to be
approximately 135, meaning that the predicted values were 135 units (dollars) away from the
actual price data on average. Considering that ETH is valued under $5k, this gap could be
troublesome if using the prediction model for actual investments. However, a $135 difference
can be improved by tweaking the model in the future. For now, since this analysis is purely
academic, this is a satisfying result.
Mean Squared Error (MSE) and Root
Mean Squared Error (RMSE)
KO ZU B
20 22
Table 2: MSE and RMSE
8
Linear Regression: Conclusion
Given the relatively low RMSE, the prediction model using Linear Regression could be interpreted
as successful and therefore cannot reject the null hypothesis that Linear Regression Machine
Learning methods can be used to accurately predict Ethereum (ETH) prices. However, more
variables should be evaluated in the future and more models should be experimented to test
accuracy across other factors outside of the opening/closing price dataset that only includes the
transaction volume.
The Prophet package in R was produced as a Machine Learning model for time series data by
Facebook. The model is open-source and available for anyone to use to accurately and
automatically forecast data. There is very little work that needs to be done outside of cleaning the
data and inputting the data into functions in R, and Prophet does the rest. This method was not
intended to be posed as a legitimate method for ETH prediction, but as a supplement to the
Linear Regression predicted done previously and for comparison purposes. The Prophet models
can be tuned by hand for better accuracy within the R environment.
YO UR B US IN ES S NA ME
SD G PR OG RE SS R EP OR T 20 20
Price Prediction Utilizing
Prophet
Setting Up Data
To begin with Prophet, the data must be loaded into R and the appropriate packages should be
loaded. The same dataset used for the Linear Regression model will be used for the Prophet
prediction model, utilizing the closing costs as the main factor in determining the price
fluctuations. One of the main differences with this model is that it will actually forecast some
future values instead of predicting over the existing dataset. To further set up the data, log
transformations are performed and the dates are converted from a character field to a date field.
9
Plugging Data into Prophet Function
The data is then put into the Prophet function for prediction. This model uses a combination of
piecewise linear and logistic growth curve trends. Similar to how a Regression Model works, but
the algorithm in Prophet automatically detects changes in trends and works even with messy
data. However, the better the data, the better the results of the model. Given the robustness of
the dataset, the results will reflect that. If the dataset is simple, it will give us a more simplistic
projection, which is what will most likely happen with the ETH dataset. In order for a more
accurate model, outside data other than opening and closing prices should be compared.
Prophet Prediction Results
The prophet prediction model in Figure 10 relays the the trend of the actual data (black dots) while
relaying a future projection beyond April 2022 and into May 2022. This projection shows, similar to
the Regression Model, a positive trend for Ethereum despite the clear downward trend shown by the
actual data. Therefor, this model is suggesting that ETH is due for an uptick in value over the Summer
of 2022.
KO ZU B
20 22
Figure 10: Prophet Prediction Results
10
For the vast majority of the model, it follows almost the exact same trend as the actual price data up
until February 2022, when the actual data starts to fluctuate much more than the Prophet model.
"There is nothing that Bitcoin can do which Ethereum can't.
While Ethereum is less battle-tested, it is moving faster, has
better leadership, and has more developer mindshare "
- Fred Ehrsam
Overall, both models showed promise in accurately being able to predict the price of Ethereum
and with more supervised training, the models could prove to be much more accurate in the
future. It is hoped that more robust datasets can be acquired in the future to test different data
against the closing prices and possibly improve the method of prediction.
Linear Regression
Can prove to be accurate
Longer process
Clear process
Key Findings
KO ZU B
20 22
Conclusions
Prophet
Easy to build
Can use messier datasets
Unclear process
Key Findings
11
References
KO ZU B
20 22
12
CRYPTOCURRENC Y P RICE PREDICTION USI NG ARIMA MODEL. AN ALYTICS VIDHYA, 3 D EC.
2021, HTTPS:// WWW .ANALYTICSVIDHYA .CO M/BLOG/2021/12/C RYP TOCURRENCY-PRICE -
PREDICTION-USI NG- ARIMA-MODEL/.
ETHEREUM PRIC E T ODAY, ETH TO USD LI VE, MARKETCAP AN D C HART. COINMARKE TCA P,
HTTPS://COINMA RKE TCAP.COM/CURRENC IES /ETHEREUM/HISTOR ICA L-DATA/. ACCESSE D 1 1
MAY 2022.
ETHEREUM USD ( ETH -USD) PRICE HIST ORY & HISTORICAL DA TA - YAHOO FINANCE.
HTTPS://FINANC E.Y AHOO.COM/QUOTE/E TH- USD/HISTORY/. AC CES SED 11 MAY 2022.
HOW TO GET CRY PTO CURRENCY PRICES IN R PREDICTIVE H ACK S.
HTTPS://PREDIC TIV EHACKS.COM/HOW-T O-G ET-CRYPTOCURRENC Y-P RICES-IN-R/. ACC ESS ED
11 MAY 2022.
LINEAR REGRES SIO N | INTRODUCTION TO LINEAR REGRESSI ON FOR DATA SCIENCE .
ANALYTICS VIDH YA, 25 MAY 2021,
HTTPS://WWW.AN ALY TICSVIDHYA.COM/B LOG /2021/05/ALL-YOU -NE ED-TO-KNOW-ABOUT -
YOUR-FIRST-MAC HIN E-LEARNING-MODEL -LI NEAR-REGRESSION/ .
LINEAR REGRES SIO N - A COMPLETE I NTR ODUCTION IN R WI TH EXAMPLES. MACHI NE
LEARNING PLUS, 12 MAR. 2017, HTTP S:/ /WWW.MACHINELEAR NIN GPLUS.COM/MACHIN E-
LEARNING/COMPL ETE -INTRODUCTION-LI NEA R-REGRESSION-R/.
MACHINE LEARN ING WITH R: LINEAR REG RESSION. BETTER DA TA SCIENCE, 25 S EPT . 2020,
HTTPS://BETTER DAT ASCIENCE.COM/MAC HIN E-LEARNING-WITH- R-L INEAR-REGRESSION /.
RPUBS - FORECA STI NG BITCOIN PRICE BA SED ON TIME SERI ES MODEL.
HTTPS://RPUBS. COM /PRANEETHA/BITCO IN_ PRICE_FORECAST. ACC ESSED 11 MAY 202 2.
VALKOV, VENELI N. CRYPTOCURRENCY PRI CE PREDICTION US ING LSTMS | TENSORF LOW FOR
HACKERS (PART III ). MEDIUM, 29 J UNE 2019,
HTTPS://TOWARD SDA TASCIENCE.COM/CR YPT OCURRENCY-PRICE- PRE DICTION-USING-LS TMS -
TENSORFLOW-FOR -HA CKERS-PART-III-2 64F CDBCCD3F.
WHAT IS ETHER EUM ? INVESTOPEDIA,
HTTPS://WWW.IN VES TOPEDIA.COM/TERM S/E /ETHEREUM.ASP. A CCE SSED 11 MAY 2022 .
WHAT IS MACHIN E L EARNING? HTTPS:/ /WW W.IBM.COM/CLOUD/ LEA RN/MACHINE-LEARN ING .
ACCESSED 11 MA Y 2 022.
WHAT IS MEAN S QUA RED ERROR, MEAN ABS OLUTE ERROR, ROO T M EAN SQUARED ERRO R
AND R SQUARED? - STUDYTONIGHT. HT TPS ://WWW.STUDYTONI GHT .COM/POST/WHAT-I S-
MEAN-SQUARED-E RRO R-MEAN-ABSOLUTE- ERR OR-ROOT-MEAN-SQU ARE D-ERROR-AND-R-
SQUARED. ACCES SED 11 MAY 2022.
ETHEREUM USD ( ETH -USD) PRICE HIST ORY & HISTORICAL DA TA - YAHOO FINANCE.
HTTPS://FINANC E.Y AHOO.COM/QUOTE/E TH- USD/HISTORY/. AC CES SED 11 MAY 2022.
rxk200039
@
utdallas
.
edu
Contact
Robert Kozub
MS Geospatial Information Sciences
University of Texas at Dallas
Appendix
A
:
Data
Appendix
B
:
Linear
Regression
Source
Code
Prophet
Source
Code
Appendices:
linkedin
.
com
/
in
/
robert
-
kozub
robertkozub
.
github
.
io