代写CMSE11475、代做Java/Python编程

时间：2024-04-02 来源：合肥网hfw.cc 作者：hfw.cc 我要纠错

Financial Machine Learning (CMSE11**5)
Group Project Assignment
2023/2024
Content
Content................................................................................................................................................................................................. 1
Project Description......................................................................................................................................................................... 2
Individual Project: ......................................................................................................................................................................... 2
Project Deadline and Submission:........................................................................................................................................... 2
Project topic ................................................................................................................................................................................... 2
Project Hints ................................................................................................................................................................................... 2
Suggested Topics ............................................................................................................................................................................ 3
Forecasting Limit Order Book ............................................................................................................................................... 3
Forecasting Stock Volatility.................................................................................................................................................... 5
Forecasting High Frequency Cryptocurrency Return.................................................................................................. 7
Project Description
The project aims to practice the use of state-of-art machine learning models to analyse financial data and
solve financial problems.
Individual Project:
The project is individual project. No group is required. Students shall select their own topic with data to
complete their own research question alone. Cooperation and discussion with each other in the learning
process is encouraged but the project shall be completed by students’ own work, not a grouped work.
Project Deadline and Submission:
Individual projects run from 15
th January 2024 (week 1) to 29th March 2024 (week 10).
The deadline of submission is 14:00, Thursday, 4
th April 2024.
The submision of the project includes the project report and all implementation codes (do NOT submit any
data). The code shall work on the originally provided datasets. The report and the codes shall be ZIPPED to
one package for submission.
The report MUST follow the given template. All sections are required. The code MUST have complete and
detailed comments for every major logical section.
Project topic
Each student should individually choose a topic from the following suggested topics (with provided data) as
your own project. You are encouraged to revise/improve the project topic to make it more practical,
challenging, and suitable for your own research question. It’s fine if many students select the same suggested
topics as their projects as long as the codes and project reports are significantly distinctive.
The aim of this project is to apply at least THREE out of five techniques illustrated in the course (Deep Neural
Network; XGBoost; Cross-validation; Ensemble Model; Interpretability) to solve a financial problem.
Project Hints
All suggested topics are based on the computer lab examples with some changes and extensions. You can
easily find similar methods and models in the computer lab examples. Carefully studying those examples
and codes are crucial for understanding this course and complete the group coursework.
Suggested Topics
Forecasting Limit Order Book
Topic
Can we use deep neural network to forecast the high-frequency return at multiple horizon for stocks using
their limit order book information?
Data
10-level high frequency Limit Order Book of five stocks: Apple, Amazon, Intel, Microsoft, and Google on 21st
June 2012. Data size from 40MB to 100+MB. You can select to use part of the data.
Method
You may define the following features:are the ask and bid price of 10 levels (𝑖 = 1, … ,10), and w**7;w**5;
𝑖,𝑎
and w**7;w**5;
𝑖,𝑏
are the volume of 10 levels
(𝑖 = 1, … ,10). w**4;w**5;
𝐿𝑂w**; ∈ **7;40
2) Bid-Ask Order Flow (OF)
𝑏𝑂𝐹w**5;,𝑖 = {
w**7;w**5;
𝑖,𝑏
, 𝑖𝑓 𝑏w**5;
𝑖 > 𝑏w**5;−1
𝑖
w**7;w**5;
𝑖,𝑏 − w**7;w**5;−1
𝑖,𝑏
,𝑖𝑓 𝑏w**5;
𝑖 = 𝑏w**5;−1
𝑖
−w**7;w**5;
𝑖,𝑏
, 𝑖𝑓 𝑏w**5;
𝑖 < 𝑏w**5;−1
𝑖
𝑎𝑂𝐹w**5;,𝑖 = {
w**7;w**5;
𝑖,𝑎
, 𝑖𝑓 𝑎w**5;
𝑖 > 𝑎w**5;−1
𝑖
w**7;w**5;
𝑖,𝑎 − w**7;w**5;−1
𝑖,𝑎
,𝑖𝑓 𝑎w**5;
𝑖 = 𝑎w**5;−1
𝑖
−w**7;w**5;
𝑖,𝑎
, 𝑖𝑓 𝑎w**5;
𝑖 < 𝑎w**5;−1
𝑖
𝑂𝐹𝑖 ∈ **7;20
3) Order Flow Imbalance (OFI)
𝑂𝐹𝐼w**5; = 𝑏𝑂𝐹w**5;,𝑖 − 𝑎𝑂𝐹w**5;,𝑖
𝑂𝐹𝐼w**5; ∈ **7;20
The features can be defined as a vector
𝐗w**5; = (w**4;w**5;
𝐿𝑂w**;
, 𝑏𝑂𝐹w**5;,𝑖
, 𝑎𝑂𝐹w**5;,𝑖
,𝑂𝐹𝐼w**5;)
𝑇
The total dimension of feature vector 𝐗w**5;
is 40+20+10=70. 𝐗w**5; ∈ **7;70
.
The target is the the LOB mid-point return 𝐫w**5; over 𝐻 future horizons (𝐻 ≥ 1).
𝐫w**5; = (w**3;w**5;,1, … , w**3;w**5;,𝐻)
𝑇
This project is to estimate the function 𝑓(∙), that takes a sequence of historical 𝐗w**5; as input and generates
vector 𝐫w**5; as output:
𝐫w**5; = 𝑓(𝐗w**5;
,𝐗w**5;−1, 𝐗w**5;−2, … , 𝐗w**5;−𝑾)
Where 𝑾 is the look back window, 𝐫w**5; = (w**3;w**5;,1, … , w**3;w**5;,𝐻)
𝑇
𝑗 = 1, … , 𝐻.
This topic shall use LSTM as one of the potential models. You may try to train the LSTM model with the raw
70-dimension features 𝐗w**5; with different 𝑾. You may also extract the features with lower dimensions 𝑀 < 70
by autoencoder and then train the LSTM model using the extracted features with different 𝑾. You can provide
a comparison of those two methods.
This project shall also address the question of the feature importance.
Forecasting Stock Volatility
Topic
This topic comprises two subtopics, both pertaining to volatility forecasting. These subtopics are as follows:
1) Is stock volatility path-dependent?
2) Is stock volatility past-dependent?
To address these questions, you have the option to employ various machine learning models for forecasting
stock return volatility. This can be achieved either by utilising past returns (path-dependent) or past volatilities
(past-dependent).
Addressing either of the aforementioned sub-questions fulfils the coursework requirements for the
FML course. There is no need to complete work for both questions.
Data
In computer lab_3_1, we show the method to download stock prices from Yahoo Finance. This topic uses the
stock adjusted prices to calculate its volatility. You shall calculate the volatility as the standard deviation of the
Ү**; daily arithmetic returns, but it's essential to note that this volatility should be computed based on returns
within distinct, non-overlapping Ү**;-day intervals. Ү**; can be five or ten days. The following figure shows the
volatility calculation, where w**3;𝑖
is the daily return and ҵ**;𝑖
is the five-day volatility.
To successfully complete the coursework, you must choose a minimum of two stocks to assess one of the
aforementioned questions. The selection of these stocks should align with your personal interests.
Method
The topic is to investigate whether the volatility is path-dependent or past-dependent. But the length 𝐿 of
the path and past are unknown. You can select 𝐿 as 5, 10, 15, 20, or 40 days in the investigation and conclude
with a best 𝐿. Please decide by yourself what lengths 𝐿 to select in your coursework.
For the question of path-dependent, the input features contain the daily returns in past 𝐿 days:
𝐗w**5; = (w**3;w**5;−1, w**3;w**5;−2, w**3;w**5;−2, … , w**3;w**5;−𝐿
)
𝑇
The output is the volatility 𝑦w**5; = ҵ**;w**5;
. Please be aware that the returns in 𝐗w**5;
shall not be included in the
calculation of the output volatility 𝑦w**5;
. As illustrated in figure below, to forecast the volatility ҵ**;w**5;
, you can use
the daily returns w**3;w**5;−1, w**3;w**5;−2,…, w**3;w**5;−𝐿
in past 𝐿 days.
For the question of past-dependent, the input features contain the previous 𝐿 volatilities:
𝐗w**5; = (ҵ**;w**5;−1, ҵ**;w**5;−2, ҵ**;w**5;−3, … , ҵ**;w**5;−𝐿
)
𝑇
The output is the volatility 𝑦w**5; = ҵ**;w**5;
.
This topic shall use any of the machine learning models.
This topic may also answer what length 𝐿 generate the best forecasting results for the path- and pastdependence.
Forecasting High Frequency Cryptocurrency Return
Topic
This topic is to study how machine learning models perform in forecasting 15-minute ahead return in any of
the 14 popular cryptocurrencies.
Data
A dataset “cryptocurrency_prices.csv” of millions of rows of **minute frequency market data dating back to
2018 is provided for building the model. The dataset contains 14 popular cryptocurrencies, distinguished by
asset IDs. The details of the asset IDs and names are in the file “asset_details.csv”. You may choose any
cryptocurrencies to forecast. The “Weight” in the file is to calculate the whole market of cryptocurrency and
will be introduced in next section.
Asset_ID Weight Asset_Name
2 2.3978952** Bitcoin Cash
0 4.30**5093 Binance Coin
1 6.779921**7 Bitcoin
5 1.386294361 EOS.IO
7 2.079441542 Ethereum Classic
6 5.894402834 Ethereum
9 2.3978952** Litecoin
11 1.609437912 Monero
13 1.791759469 TRON
12 2.079441542 Stellar
3 4.**7192** Cardano
8 1.09**2289 IOTA
10 1.09**2289 Maker
4 3.555348061 Dogecoin
In the file “cryptocurrency_prices.csv”, the target has been calculated and provided as the column “Target”.
The target is derived from the log return over the future 15 minutes, for each cryptocurrency asset 𝑎 as the
residual of 15 minutes log return Targetw**5;
𝑎
. Noted that, in each row, the “Target” has already been aligned as
the future 15 minute return residual and is to be forecasted. (Target: Residual log-returns for the asset over
a 15 minute horizon.)
We can see the features included in the dataset as the following:
timestamp: All timestamps are returned as second Unix timestamps (the number of seconds elapsed since
1970-0**01 00:00:00.000 UTC). Timestamps in this dataset are multiple of 60, indicating minute-by-minute
data.
Asset_ID: The asset ID corresponding to one of the crytocurrencies (e.g. Asset_ID = 1 for Bitcoin). The mapping
from Asset_ID to crypto asset is contained in asset_details.csv.
Count: Total number of trades in the time interval (last minute).
Open: Opening price of the time interval (in USD).
High: Highest price reached during time interval (in USD).
Low: Lowest price reached during time interval (in USD).
Close: Closing price of the time interval (in USD).
Volume: Quantity of asset bought or sold, displayed in base currency USD.
VWAP: The average price of the asset over the time interval, weighted by volume. VWAP is an aggregated
form of trade data.
Method
You may define some additional features. For example, the past 5 minute log return, the past 5 minute
absolute log return, past 5 minute highest, past 5 minute lowest, etc.
You may try simple models, i.e., linear tree, and complex models, i.e., LSTM and compare their forecasting
performance.
If using LSTM, you may also study what length of the looking back window provide the best forecasting
performance.
In addition, the feature importance shall also be studied to show which features contribute to the stock relative
performance in the future the best.
Appendix
This appendix introduces how the target is calculated.
The log return at time w**5; for asset 𝑎 is calculated as:
𝑅w**5;
𝑎 = log (
𝑃w**5;+16
𝑎
𝑃w**5;+1
𝑎 )
As the crypto asset returns are highly correlated, forecasting returns for individual asset shall remove the
market signal from individual asset returns. Therefore, the weighted average cryptocurrency market return 𝑀w**5;
is defined as:
is the weight for each cryptocurrency and is defined in the column “Weight” in the file
“asset_details.csv”.
Then, a beta is calculated for each asset ҵ**;
Where the bracket &#**01;∙&#**02; calculate the rolling window average over the past 3750 minute windows.
Then, a regression residual is defined as the target for each asset Targetw**5;
BUT, you don’t need to do this calculation. The target values have been calculated and provided in the 请加QQ：99515681 邮箱：99515681@qq.com WX：codehelp

扫一扫在手机打开当前页

上一篇:菲律宾大使馆周末上班吗大使馆上班时间是什么时候

下一篇:QBUS6820代做、Python编程语言代写

注：此文是出于传递更多信息之目的。所转载的内容，其版权均由原作者和资料提供方所拥有！若侵犯了您的合法权益，请联系我们，将及时更正、删除，谢谢。