While trading may be the most exciting and lucrative domain of application of Machine Learning, it is also one of the most challenging. Trading is not only about buying or selling, nor is it just about analysing the financial state of a target company. One of the reasons why it is so difficult to be a top trader is that it requires to consider a large amount of data of different nature. This also explains the machine learning hype in trading. Text, speech, numbers, images … Machine learning algorithms can deal with almost any type of data. In this series of articles, we will introduce an implementation of a not so common deep learning approach to stock price trend prediction based on financial news. Our inspiration comes from the recent research paper “ Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction “ — LCW.
Recent trends in research paper and blog articles
Many approaches introduced in last years’ research papers suffer from incompleteness. One of those approaches consists in designing an algorithm based on last days’ stock prices only. Recurrent Neural Networks
What makes this paper so special ?
Nowadays, A.I. is trying to become more human. And algorithmic trading is no exception. Some of the recently published research papers try to design frameworks that imitate real investors. LCW is one of them and that’s why we chose it.
Where is the innovation ?
The authors have taken into account three characteristics of the learning process followed by an investor struggling with the “chaotic news” :
- First, the Sequential Context Dependency. This simply refers to the fact that
a single newsis more informativewithin a broader context than isolated.
- Second, the Diverse Influence. One critical news can affect the stock price for weeks, whereas a trivial one may have zero effect.
the “Effective and efficient Learning”. It is learning from the more common situations before turning to exceptional cases.
This is not a theoretical paper but rather a math-engineering paper. The design process might be like this:
– We have to deal with sequences of press articles. What neural network could we use for that ?
– Recurrent neural networks
– OK. Which one is easier to train and does not kill the gradient ?
– GRU neural network.
– Good !
– Now how to deal with diverse influence ? I want my algorithm to focus on the most important articles ?
– Alright, now give me a simple neural network to perform a three class classification ?
– Multilayer perceptron !
There you got it :
As you can see,
They have also implemented a Self-Paced Learning algorithm. It aims at performing
Pierrick and I are two french engineering student currently in our second year of Engineering Master program. We are by no means expert in trading and beginners in machine learning. It is our first paper implementation, and first technical blog post as well, so we are open to any constructive criticism both on our code and articles.
Our workflow is divided in 4 steps :
For the purpose of this research project we have scraped all the articles published on reuters.com from 2015 to 2017. We used mainly BeautifulSoup and Urllib library as well as the multiprocessing library. And yes, the whole project is in Python 3.
- Articles Vectorization
We chose not to follow the paper on this part. After collecting more than 1 million articles (see 5.1.1 on the paper) they have trained a Word2Vec on the whole vocabulary of their articles. And then, they computed the vector mean of all the words in an article to make a vector representation of it. We preferred to use Doc2Vec for a better representation of the article. Our choice was inspired by this comparison. We used Gensim library for that.
- Dataset Creation
This part consisted
- Model Training
We used Keras to build the model. The wrapper Time Distributed was of great use to apply
Implementing this paper was thrilling, and we look forward to writing about each step of the implementation. Many thanks to the authors for this inspiring paper.
More to come, we will release the code soon !