Table of Contents
calculate a sentence similarity between japanese sentence pairs, this program can be used for inspecting whether the sentence is machine translated or not.
-
calcTranslationSimilarity.bat
batch file for demo -
calcTranslationSimilarity.py
main program -
sentences.csv
input file for the batch file -
sentences_out.csv
output file with sentence similarity.
-
Windows 10 x64
-
Anaconda 5.2.0 (conda 4.9.2)
-
Python 3.8.5
You need to install MeCab library first. For installation on Anaconda on windows 10, plz refer to: https://emotionexplorer.blog.fc2.com/blog-entry-349.html
$ conda install -c mzh mecab-python3$ conda install -c conda-forge unidic-lite Run calcTranslationSimilarity.bat for demo.
-
You can change between Normal mode and Important mode. Normal mode is based on normal 'wakati' sentence separation, while Important mode is based on the important components (i.e. verb, noun, adjective, etc). default is Important mode.
# similarity_score = o_mecab.calcTranslationSimilarity_normal(original_translation, other_translations) similarity_score = o_mecab.calcTranslationSimilarity_important(original_translation, other_translations)
-
You can change the interest of components in the Important mode below.
if node.feature.split(",")[0] == "名詞" or node.feature.split(",")[0] == "動詞" or node.feature.split(",")[0] == "形容詞" or node.feature.split(",")[0] == "形容動詞":
This software is released under the MIT License, see LICENSE.
