BUSI1125 Softwares and Tools for Data Analytics
INDIVIDUAL ASSIGNMENT
Autumn 2023/24
This individual assignment carries 100% of the total marks of this module.
Students are required to download 2 different datasets, and analyse each dataset using a
randomly assigned data analytics software.
Dataset 1 (poverty): Eradicating extreme poverty for all people everywhere by 2030 is a
pivotal goal of the 2030 Agenda for Sustainable Development. It has been recognised that
ending poverty must go hand-in-hand with strategies that build economic growth and address
a range of social needs including education, health, social protection, and job opportunities,
while tackling climate change and environmental protection. As a data analyst your objective
is to conduct an exploratory analysis to better understand the relationships/associations
between the level of income (outcome) and the selected socio-economic factors (features).
Dataset 1, extracted from the World Bank Development Indicators, includes the following
variables for 151 countries.
Variable Name Description
country Name of the country
region Region of the country
comp_edu Compulsory education, duration (years)
female_labour Ratio of female to male labour force participation rate (%)
agri_value_added Agriculture, forestry, and fishing, value added (% of GDP)
political_stability Political Stability and Absence of Violence/Terrorism: Estimated index
income_group Income group classification by the World Bank based on gross national
income (GNI) per capita (High income, Upper-middle income, Lower-
middle income, Low income)
Dataset 1 is available on the module Moodle page or download directly from:
https://raw.githubusercontent.com/mmchit/poverty/main/poverty.csv
Dataset 2 (wage): One of the other UN Sustainable Development Goals is about promoting
inclusive and sustainable economic growth, employment and decent work for all (Decent work
and Economic Growth). Decent work means opportunities for everyone to get work that is
productive and delivers a fair income, security in the workplace and social protection for
families, better prospects for personal development and social integration. As a data analyst
your objective is to conduct an exploratory analysis to better understand the
relationships/associations between the individual’s wage (outcome) and the selected
demographic factors (features).
Dataset 2, extracted from The United States National Longitudinal Surveys, includes the
following variables for 935 individuals.
Variable Name Description
wage Average weekly earnings (in US$)
hours Average weekly working hours
exper Years of working experience
age Age in years
marital Marital status (Married, Single)
gender Gender (Male, Female)
education Level of education (High School, College, Graduate, Post-Graduate)
Dataset 2 is available on the module Moodle page or download directly from:
https://raw.githubusercontent.com/mmchit/wage/main/wage.csv
Assignment requirements
Students are required to import the dataset and analyse with the assigned software (R or
Python). For descriptive and exploratory analytics and interpretations, students are required
to:
1. check data quality issues (missing values, data entry errors, inconsistencies, etc.),
perform necessary data cleansing, and briefly explain your data cleaning strategy.
2. identify the type of variables, provide appropriate summary statistics (all measures of
location and dispersion and frequencies) of each variables with appropriate
visualisations and interpretations.
3. identify the objectives of analytics based on the given dataset and scenario and identify
the relevant/appropriate relationships/associations between the outcome and feature
variables, conduct exploratory analysis with appropriate visualisations, and present
and interpret the analyses (based on DIKW pyramid).
4. write up a data analytics report with clear and effective communication.
The 1500-word assignment should include the following two sub-sections.
Section 1: Report of descriptive and exploratory analytics of Dataset 1 using the
assigned software with appropriate visualisations, and interpretations (around 750
words)
Section 2: Report of descriptive and exploratory analytics of Dataset 2 using the
assigned software with appropriate visualisations, and interpretations (around 750
words)
Students are also required to submit R-scripts and Jupyter Notebook files via Moodle
submission box.
Deadline Date for Submission of Coursework
Your coursework needs to be submitted electronically via the Module Moodle page. See the
Student Services website and the programme handbook for further details of this process.
The deadline for coursework submission is 3:30pm on Wednesday, 27th of December
2023. Late submission will attract marks deduction penalty unless an extension has been
approved by Student Services. Please familiarise yourself with the extenuating circumstances
policy and process for submitting a claim.
Five marks will be deducted for each working day (or part thereof) if coursework is submitted
after the official deadline without an extension having been obtained. Except in exceptional
circumstances, late submission penalties will apply automatically unless a claim for
extenuating circumstances is made before the assessment deadline.
Coursework Submission Requirements:
A maximum word count of the assignment is 1500 words and must be adhered to.
The penalty for exceeding this limit is a five mark deduction for exceeding up to 300
words, 10 marks deduction for exceeding between 301 and 500 words, and 15
marks reduction for exceeding over 501 words.
The actual word count of the assignment must be stated by the student on the first
page (cover sheet) of the assignment.
The overall word count does include citations and quotations.
The overall word count does not include the references or bibliography at the
end of the coursework.
The word count does not include figures and tables with numeric values and the titles
of figure and table. Any statement, interpretation, and explanation presented in
a figure or a tabular form will be included in the overall wordcount,
Appendices (mostly supporting materials that are not directly related to the assignment
and will not be considered in marking) are not included in the overall word count.
Students should prepare and submit their coursework assessments via Moodle in
the following format:
Font: Verdana 11 point
Spacing: 1.5 spaced
Margins: Normal (2.5 cm)
Referencing: Harvard citation style
Plagiarism will not be tolerated. Please consult the Business School Undergraduate Student
Handbook for more guidelines on how to present and submit your essays. It is the strong
advice of the Business School that you should avoid plagiarism by engaging in ethical and
professional academic practice.
In accordance with the University’s Quality Manual, in normal circumstances, marked
coursework and associated feedback will be returned to you within 15 working days of the
published submission deadline. Therefore, students submitting work before the published
deadline should not have an expectation that early submission will result in earlier return of
work. Where coursework will not be returned within 15 working days for good reason (for
example in circumstances where a student has been granted an extension, illness of module
convenor, or lengthy pieces of coursework), students will be informed of the timescale for the
return of the coursework and associated feedback.
Additional circumstances where coursework may not be returned within 15 working days for
good reason can include the University closure dates. Therefore, where this applies, you will
be informed in advance of the date coursework feedback will be provided to you.
请加QQ:99515681 或邮箱:99515681@qq.com WX:codehelp