EE5434 final project
Data were available on Nov. 5 (see the Kaggle website)
Report and source codes due: 11:59PM, Dec. 6th
Full mark: 100 pts.
During the process, you can keep trying new machine learning models and boost the learning
accuracy.
You are encouraged to form groups of size 2 with your classmates so that the team can
implement multiple learning models and compare their performance. If you cannot find any
partners, please send a message on the group discussion board and briefly introduce your
expertise. If you prefer to do this project yourself, you can get 5 bonus points.
Submission format: Report should be in PDF format. Source code should be in a notebook file
(.ipynb) and also save your source code as a HTML file (.html). Thus, there are three files you
need to upload to Canvas. Remember that you should not copy anyone’s codes, which can lead
to faisure of this course.
Files and naming rules: If you have two members in the team, start the file name with G2,
otherwise, G1. For example, you have a teammate and the team members are: Jackie Lee and
Xuantian Chan, name it as G2-Lee-Chan.xxx. 5 pts will be deducted if the naming rule is not
followed. In your report, please clearly show the group members.
How do we grade your report? We will consider the following factors.
1. You would get 30% (basic grade) if you correctly applied two learning models to our
classification problem. The accuracy should be much better than random guess. Your
report is written in generally correct English and is easy to follow. Your report should
include clear explanation of your implementation details and basic analysis of the
results.
2. Factors in grading:
a. Applied/implemented and compared at least 2 different models. You show good
sense in choosing appropriate models (such as some NLP related models).
b. For each model, clear explanation of the feature encoding methods, model
structure, etc. Carefully tuned multiple sets of parameters or feature engineering
methods. Provided evidence of multiple methods to boost the performance.
c. Consider performance metrics beyond accuracy (such as confusion matrix, recall,
ROC, etc.). Carefully compare the performance of different
methods/models/parameter sets. Being able to present your results using the most
insightful means such as tables/figures etc.
d. Well-written reports that are easy to follow/read.
e. Final ranking on Kaggle. For each of the factor, we have unsatisfactory (1), acceptable (2), satisfactory (3), good (4),
excellent (5). The sum of each factor will determine the grade. For example, student A got 4
good and 1 acceptable for a to e. Then, A’s total score is 4*4+2=16. The full mark for a to e is
25. So, A’s percentage is 64%.
Note that if the final performance is very close (e.g. 0.65 vs 0.66), the corresponding
submissions belong to the same group in the ranking.
Factors that can increase your grade:
1. You used a new learning model/feature engineering method that was not taught in
class. This requires some reading and clear explanation why you think this model fits this
problem.
2. Your model’s performance is much better than others because of a new or optimized
method.
The format of the report
1. There is no page limit for the report. If you don’t have much to report, keep it simple.
Also, miminize the language issues by proofreading.
2. To make our grading more standard, please use the following sections:
a. Abstract. Summarize the report (what you done, what methods you use and the
conclusions). (less than 300 words)
b. Data properties (data explortary analysis). You should describe your
understanding/analysis of the data properties.
c. Methods/models. In this section, you should describe your implemented models.
Provide key parameters. For example, what are the features? If you use kNN,
what is k and how you computed the distance? If you use ANN, what is the
architecture, etc. You should separate the high-level description of the models
and the tuning of hyper-parameters.
d. Experimental results. In this section, compare and summarize the results using
appropriate tables/figures. Simplying copying screening is acceptable but will
lead to low mark for sure. Instead, you should *summarize* your results. You
can also compare the performance of your model under different
hyperparameters.
e. Conclusion and discussion. Discussion why your models perform well or poorly.
f. Future work. Discuss what you could do if more time is given.
3. For each model you tried, provide the codes of the model with the best performance. In
your report, you can detail the performance of this model with different parameters.
The code
The code should include:
1. Preprocessing of the data 2. Construction of the model
3. Training
4. Validation
5. Testing
6. And other code that is necessary
This is the link that you need to use to join the competition.
https://www.kaggle.com/t/79178536956041b8acb64b6268afb4de
请加QQ:99515681 邮箱:99515681@qq.com WX:codinghelp