合肥生活安徽新闻合肥交通合肥房产生活服务合肥教育合肥招聘合肥旅游文化艺术合肥美食合肥地图合肥社保合肥医院企业服务合肥法律

代写MLDS 421: Data Mining

时间:2024-02-21  来源:合肥网hfw.cc  作者:hfw.cc 我要纠错


Individual Assignment (100 points)

Instructions:

• Submit the paper review as a word or pdf file.

• Submit code as a Python notebook (.ipynb) file along with the HTML version.

• Write elegant code with substantial comments. If you have referred to or reused code from a website add the links as reference.

1. Paper Review – Following the guidelines review any one of the technical papers from Group2 (20)

2. Generate random multidimensional (n=1000, D >= 15) data using sklearn. (20)

• Build a K-means function from scratch (without using sklearn) and make assumptions to simplify the code as needed.

• Use the elbow method to find an appropriate value for k

• Use the silhouette plot to evaluate your clusters

• Re-cluster the data to see if you can improve your results

• Perform PCA on the original dataset and retain the most important PCs.

• Run K-means on the PCA output, compare results with respect to cluster quality and time taken

3. Using the data from 2, perform hyperparameter optimizations of the following clustering algorithms. (20)

• Agglomerative hierarchical clustering (number of clusters, linkage criterion)

• Density-based clustering (DBSCAN) (eps, minPts)

• Model-based clustering (GMM) (number of clusters)

4. Data mining and Cluster analysis of the following dataset (40)

https://data.cdc.gov/NCHS/NCHS-Injury-Mortality-United-States/vc9m-u7tv/about_data

The dataset contains the number of injury deaths per year by different injury intents from years 1999 to 2016 in the US. There are different groupings by age group, gender, race, and injury intent.

As a data science consultant, your goal is to mine the dataset and extract meaningful insights for your clients in the health care industry. The course of action is as follows:

• Review and understand the structure of the data.

o Columns are year, sex, age group, race, injury mechanism, injury intent, deaths, population, age specific rate, and the statistics of age specific rate

• Data Transformation

o For each year, group by age group, sex, or race and summarize data as needed for subsequent analysis.

• Exploratory Data Analysis (10)

o Create statistical summaries.

o Create boxplots, correlation/pairwise plots.

o Perform basic outlier analysis.

• Clustering (15)

o In a few lines create a plan that describes the 3-4 questions that are suitable for cluster analysis.

o List the various clustering algorithm(s) you’d use and why:

o E.g., K-means, K-medians, K-modes, Hierarchical methods, DBSCAN, etc.

o Apply the above algorithms to the filtered dataset based on your plan.

o Report on the quality of the clusters, pros/cons, and summarize your findings.

• Bias/Fairness Questions (15)

Data

o In the dataset under study, from a bias/fairness (b/f) perspective, there are 2 sensitive features: race and gender.

o Analyze the data by a combination (2) of features (sensitive and other). Example features to include in the analysis: location (county, state), and other features you consider relevant. Though these features may not be considered sensitive they can be a proxy for sensitive features.

o Determine feature groupings that are relevant for your analysis and explain your choices.

o Do you detect bias in the data?

o Present the results visually to show salient insights with respect to bias.

o Based on the EDA and your project objective, develop a hypothesis about where b/f issues could arise in the modeling (cluster analysis).

Modeling

o Based on your hypothesis, assess the fairness of your model/analysis by applying the fairness-related metrics that are available in any of the following tools: Python Fairlearn package, R Fairness/Fairmodels package, or other similar tools.

o Explain the reasoning for the groups that you selected for the fairness metrics.

o Compare the fairness metrics for the different groups.

o If you developed multiple models compare the fairness metrics for the models.

o Comment on the results.

o Suggest how the bias/fairness issues could be mitigated.

o Present the results visually to show salient insights.

Note: In the Fall Quarter you attended lectures on Bias/Fairness. Additionally, the following is a useful resource for analyzing b/f in data and modeling: Fairness & Bias Metrics
请加QQ:99515681  邮箱:99515681@qq.com   WX:codehelp 

扫一扫在手机打开当前页
  • 上一篇:代写 Behavioural Economics ECON3124
  • 下一篇:代写COMP1721、代做java程序设计
  • 无相关信息
    合肥生活资讯

    合肥图文信息
    新能源捕鱼一体电鱼竿好用吗
    新能源捕鱼一体电鱼竿好用吗
    海信罗马假日洗衣机亮相AWE  复古美学与现代科技完美结合
    海信罗马假日洗衣机亮相AWE 复古美学与现代
    合肥机场巴士4号线
    合肥机场巴士4号线
    合肥机场巴士3号线
    合肥机场巴士3号线
    合肥机场巴士2号线
    合肥机场巴士2号线
    合肥机场巴士1号线
    合肥机场巴士1号线
    合肥轨道交通线路图
    合肥轨道交通线路图
    合肥地铁5号线 运营时刻表
    合肥地铁5号线 运营时刻表
  • 币安app官网下载 短信验证码

    关于我们 | 打赏支持 | 广告服务 | 联系我们 | 网站地图 | 免责声明 | 帮助中心 | 友情链接 |

    Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥网 版权所有
    ICP备06013414号-3 公安备 42010502001045