This repository contains data and source code in the paper "Product Review Summarization by Exploiting Phrase Properties".
In the experiment, we use a list of aspect keyword, as follows:
- a1:
外观 外形 设计 外型 外壳 外表
- a2:
质量 材质 手感 质感 作工 做工
- a3:
屏幕 触摸屏 显示屏 分辨率 led 触摸板 液晶屏 电阻屏 显示 触屏
- a4:
性价比 价位 价钱 价格 售价
- a5:
系统 稳定性 性能 速度 操作系统 兼容性
- a6:
软件 导航 wifi
- a7:
操控 操控性 操作性 操作 触控
- a8:
电池 待机 电量 续航 耗电
- a9:
键盘 按键 功能键 按钮
- a10:
信号 网络 蓝牙 通话 天线 通信 通讯
- a11:
短信 彩信
- a12:
界面 画面 画质 ui
- a13:
输入法 手写 输入
- a14:
机型 机身 款式 样式
- a15:
照相 摄像 照像 相机 拍照 镜头 像素 闪光灯 摄像头 照相机 录音
- a16:
音效 音色 音质 话筒 听筒 扬声器 喇叭 话音 音响 语音 立体声
- a17:
存储 内存 内存卡 存储卡 储存卡 扩展卡
The aspect keyword list can also be retrieved in summarizer.model.Aspect
.
The original review data is available at data/all_reviews/
. Each file corresponds to a cell phone.
The data of phrases with sentiment polarities is available at data/phrases_new/
. Each file corresponds a cell phone.
The summaries which are generated by 3 baselines and our system are available at data/summary/
. xxxx_reviewSum_summary.txt
is generated by our system and xxxx_lexrank_summary.txt
, xxxx_opinosis_summary.txt
, xxxx_basicSum_summary.txt
are generated by the other 3 baseline systems in our paper.
Task 1 is pairwise user preference evaluation and Task 2 is user scoring evaluation. In Task 1, we run 6 pairwise comparisons of 4 summaries generated by our system and baseline. In Task 2, we ask annotators to evaluate 4 aspects of each summary.
We asked 20 annotators to do the evaluation task, 10 annotators are assigned to Task 1 and 10 annotators are assigned to Task 2. All of the annotators are native Chinese speakers with experiences of product review writing. We construct the evaluation dataset using customer reviews of 10 cell phones. For each annotator, at least 5 products are annotated. For each product of each task, at least 5 annotations are performed.
The annotation data is available at data/evaluation_data/
. task1
subfolder contains annotation data for Task 1 and task2
subfolder contains annotation data for Task 2, respectively. Since the exact summarization algorithm name is hidden to annotators, each task item is assigned with an UUID. evaluation.log
is used to store the map between the task item and the UUID.
summarizer.summarizer.ReviewSummarizer
public String getSummary()
get the summary generated by our system.
summarizer.evaluation.EvaluationDataGen
public static List<Pair> evaluationPairGenerator(int productID)
generate evaluation file for Task 1, where productID
denotes the ID of the product in the original review data.
public static Map<String, String> evaluationGenTask2(int productID)
generate evaluation file for Task 2, where productID
denotes the ID of the product in the original review data.
public void printTask1Statics(String task1ResultDir)
print the evaluation result of Task 1 on the console. task1ResultDir
denotes the directory where the annotation files of Task 1 are contained, e.g., data/evaluation_data/task1/
.
public void printTask2Statics(String task2ResultDir)
print the evaluation result of Task 2 on the console, task2ResultDir
denotes the directory where the annotation files of Task 2 are contained, e.g., data/evaluation_data/task2/
.