合肥生活安徽新闻合肥交通合肥房产生活服务合肥教育合肥招聘合肥旅游文化艺术合肥美食合肥地图合肥社保合肥医院企业服务合肥法律

CISC3025代做、代写Java,c++设计编程

时间:2024-02-07  来源:合肥网hfw.cc  作者:hfw.cc 我要纠错



CISC3025 - Natural Language Processing
Project #1, 2023/2024
(Due date: 5th February, 2024)
Project Rule
This is an individual course project. You are strongly recommended to commence work on
each assignment task of the project soon after it is announced in class/UMMoodle. Students are
free to discuss the project, but they are not permitted to share any code and report.
Problem Description
This assignment asks you to implement a sequence comparison algorithm (e.g., Levenshtein
Distance). Given   = "AACGCA" and   = "GAGCTA", the objective is to match identical
subsequences as far as possible through alignment. It can be seen as a way to transforming one
sequence into the other with the substitution, insertion, and deletion of characters. The cost of
operations is considered as:
Ÿ    ( ,  ) = 0       ∈ ∑;
Ÿ    ( ,  ) = 2      ,   ∈ ∑       ≠  ;
Ÿ    ( ) =    ( ) = 1       ∈ ∑.
In the following example, three operations are applied for aligning the two sequences, i.e.,
   ( ,  ),    ( ), and    ( ). Hence, the minimum cost for such transformation is 4.
 M
The similarity of two sequences can be defined as the best score among possible alignment
between them, i.e. the minimum cost or minimum edit distance. The computation of such
alignment between two sequences can be efficiently solved by using dynamic programming
approach based on scoring matrix (Table 1):
DynamicProgramming(x, m, y, n)
1. T[-1,-1] ¬ 0
2. for j ¬ 0 to n - 1
3. do T[-1, j] ¬ T[-1, j - 1] + Ins(yj)
4. for i ¬ 0 to m - 1
5. do T[i, -1] ¬ T[i -1, - 1] + Del(xi)
6. for j ¬ 0 to n - 1
7. do T[i, j] ¬ min{ T[i-1, j - 1] + Sub(xi, yj),
8. T[i-1, j] + Del(xi),
9. T[i, j - 1] + Ins(yj)}
10. return T[m - 1, n - 1]
 ( , ) # G A G C T A
Table 1. Scoring matrix
More information regarding dynamic programming and scoring matrix can be found in Chapter
1 & Chapter 2 of [1] and [2].
Requirements
1. You are asked to implement the dynamic programming algorithm in Python. Input to the
program are two strings and the minimum cost is output as the comparison result, followed
by a possible alignment between the two strings.
The following shows a scenario of the input and outputs:
> AACGCA
> GAGCTA
The cost is: 4
An possible alignment is:
A A C G C - A
| | | | | | |
G A – G C T A
2. Extend your program to deal with sentence by taking words as the comparison units instead
of letters.
The following shows a scenario of the input and outputs:
> I love natural language processing
> I really like natural language processing course
The cost is: 4
An possible alignment is:
I love − natural language processing −
| | | | | | |
I really like natural language processing course
3
3. Write a function to compute the similarities between words in batch mode and store your
results in a file.
In the input file “word_corpus.txt”, each row contains a word and a symbol, ‘R’, or ‘H’,
indicating the correct Reference and the Hypothesis, respectively. Your program compares
each hypothesis to the reference, and appends the minimum edit distance to the
corresponding hypothesisin the output file, as shown in the following diagram. The number
of the hypotheses for each reference may be varied. The name of the output file should be
“word_edit_distance.txt”.
4. Write a similar function to compute the similarities between sentences in batch mode
“sentence_corpus.txt” and store your results in a file “sentence_edit_distance.txt”. The
References and Hypotheses are arranged in a similar way as in Requirement (3). Note, the
number of hypotheses for each reference is constant.
The Starter Code
The starter code is in the edit_distance.py. To make it easier for you to do this project, we
provide a starter code written in python. If you enter into the folder “Assignment#1” and
execute the following command:

The program will execute the function word_edit_distance( ) to calculate the edit distance
and the alignment, then output the result to the command line using the output_alignment
function( ).
Similarly, you can use the following command to test your implemented
sentence_edit_distance( ) function:
For Requirements (3) and (4), you can run the following command to specify the name of
input and output files:
Input file:
R satisfaction
H satisfacion
H satesfaction
H satisfation
H satiusfacson
.
.
.
Output file:
R satisfaction
H satisfacion 1
H satesfaction 2
H satisfation 1
H satiusfacson 4
.
.
.
$python edit_distance.py -w ‘word1’ ‘word2’
$python edit_distance.py -s ‘sentence1’ ‘sentence2’
4
The output_alignment( ) function has been already implemented to show the alignments in
a proper format.
Submissions
You need to submit the following materials:
1. Runnable program and source code;
2. A brief report containing the following contents:
• Introduction: Clearly state the goal of your project. Explain why the project is both
important and interesting in the context of NLP.
• Background: Briefly introduce one or two fundamental NLP concepts that are central
to your project.
• Approach & Challenges: Summarize your methodological approach in one concise
paragraph. Identify one significant challenge you encountered and describe how you
addressed it.
• Results: Summarize the outcomes of your project, highlighting the main findings.
• Conclusion: Reflect briefly on what you learned from the project and what was
accomplished.
3. The output files.
References
[1] C. Charras and T. Lecroq, Sequence Comparison. Université de Rouen.
(https://www.researchgate.net/profile/Thierry_Lecroq/publication/2783325_Sequence_Com
parison/links/09e415108d23e64eb7000000.pdf)
[2] http://ultrastudio.org/en/Dynamic%20programming%20table
$python edit_distance.py -bw ‘inputfile’ ‘outputfile’
$python edit_distance.py -bs ‘inputfile’ ‘outputfile’

如有需要,请加QQ:99515681 或WX:codehelp

扫一扫在手机打开当前页
  • 上一篇:代做Micro Language Compiler
  • 下一篇:CS 61程序代做、代写C/C++编程设计
  • 无相关信息
    合肥生活资讯

    合肥图文信息
    新能源捕鱼一体电鱼竿好用吗
    新能源捕鱼一体电鱼竿好用吗
    海信罗马假日洗衣机亮相AWE  复古美学与现代科技完美结合
    海信罗马假日洗衣机亮相AWE 复古美学与现代
    合肥机场巴士4号线
    合肥机场巴士4号线
    合肥机场巴士3号线
    合肥机场巴士3号线
    合肥机场巴士2号线
    合肥机场巴士2号线
    合肥机场巴士1号线
    合肥机场巴士1号线
    合肥轨道交通线路图
    合肥轨道交通线路图
    合肥地铁5号线 运营时刻表
    合肥地铁5号线 运营时刻表
  • 币安app官网下载

    关于我们 | 打赏支持 | 广告服务 | 联系我们 | 网站地图 | 免责声明 | 帮助中心 | 友情链接 |

    Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥网 版权所有
    ICP备06013414号-3 公安备 42010502001045