跳到主要內容

臺灣博碩士論文加值系統

(3.236.23.193) 您好!臺灣時間:2021/07/24 13:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:何國華
研究生(外文):Ronald Kuo-Hua Ho
論文名稱:以crowdsourcing方式建立SKYPE/SILK網路電話使用者經驗模型: 數據的收集、過濾與分析
論文名稱(外文):Crowdsourcing for QoE Models of SKYPE/SILK Calls:An Empirical Study on the Collection, Screening, and Analysis of Data
指導教授:黃寶儀黃寶儀引用關係
指導教授(外文):Polly Huang
口試委員:葉素玲陳宏銘
口試委員(外文):Su-Ling YehHomer H. Chen
口試日期:2015-06-24
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電信工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:117
中文關鍵詞:網路電話使用者感受Crowdsourcing心理物理學
外文關鍵詞:VoIPCrowdsourcingUser PerceptionQoEPsychophysics
相關次數:
  • 被引用被引用:0
  • 點閱點閱:153
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
對於VoIP服務,使用者體驗的建模與量測一直是個課題,有了Crowdsourcing平台的幫助,使用者體驗的研究者能夠執行大規模的使用者調查,從更大且更廣的人口分布中。大量的受測者資料在一定的時間中能夠被輕易地取得。然而,關於以Crowdsourcing的方式推導一個可靠的QoE模型的細節常常被忽略且值得被關注。目前的研究提供三個主要的貢獻,首先,三階段的使用者調查在Crowdsourcing平台上執行,參與者對數個帶有不同網路延遲的網路電話通話做滿意度評分,再利用得到的資料建立預測模型。結果中得到端對端延遲對使用者體驗的影響為指數型的衰退。
其次,本論文也對於先前實驗室所有收集的使用者資料,做了一系列的分析並且提供檢測資料可靠度的方法。本文提出的三個主要的檢測方法,Cheat proof test、Normality test 和 Convergence test。其中,Cheat proof test能夠自動根據使用者的評分和資訊來判斷資料是否該被濾除、Normality test 則是來檢視使用者評分的分布是否符合Normal distribution、Convergence test利用數值分析的方式對使用者評分做收斂性的分析。第三點,本論文利用上述三種方法交叉比對不同的資料集(位元率、封包遺失率、網路延遲)的檢測結果,並且三種檢測方法的有效度及資料的特性夠被詳細地分析及討論。


Modeling and measurement of user experience for Voice over IP (VoIP) service has long been a subject of study. With the help of crowdsourcing platform, researchers of studying user perception are allowed to perform user study from a large and diverse population. Moreover, an amount of subjects/user score data can be easily collected in a limited time. However, some details concerning the process of deriving a reliable QoE model with crowdsourcing was often being neglected but desperately needed to be addressed.

Current study provides three main contributions. First, 60 participants are recruited to score emulated Skype calls with different levels of delay, and 44 users’ data are adopted to build a closed-form QoE model. Results show that the end-to-end delay has an impact on the user experience on an exponential scale.
Second, taking all our previous user studies as an example, a set of analysis and quality control methodologies for user scores data are provided to increase the reliability of our study.

Proposed methodologies involved in three kinds of test: cheat-proof test, normality test and convergence test. Proposed cheat-proof test investigates the details of how users’ data were screened based on their behaviors on rating scores. Normality test shows the scores in most of tracks are normally-distributed. Convergence test examines the scores did reach pre-defined convergence criterion in a numerical view. Third, by cross-comparing the results of three tests, the effectiveness and results of these tests were discussed and analyzed respectively among three data sets (bit-rate, loss rate and delay).


Abstract v
Contents vii
Chapter 1 Introduction 1
1.1 Modelling 1
1.2 Data screening, collection and analysis 6
Chapter 2 Literature Reviews 12
2.1 Measuring QoE for delay 13
2.2 Issue in Crowdsourcing 16
2.3 Analysis of data 18
2.3 Pychology background 28
Chapter 3 Pilot Experiments 28
3.1 Evolution of Methodology 29
3.2 Experiment design-I 31
3.3 Prelimanary Results 33
3.4 Experiment design-II 36
3.5 Prelimanary Results 38


Chapter 4 Derived Model 39
4.1 Model form 39
Chapter 5 Full-sclae experiment 41
5.1 Experiment design 41
5.2 ANOVA Tests 43
5.2 Model specifics 43
Chapter 6 Data Screening 47
6.1 Cheat-proof test 48
6.2 Results of cheat-proof test 49
6.3 Analysis of outliers 51
6.4 Analysis of data 53
6.5 Results of cheat-proof test for all collected data 55
Chapter 7 Data Convergence 57
7.1 Convergence test 58
7.2 Factors of convergence 64
7.3 Quantifying convergence 67
7.4 Comparison between screened and noise data 70




Chapter 8 Normality od Data 72
8.1 Normality test 73
8.2 Graph approach 75
8.2.1 Frequency histogram with normal distribution overlay 75
8.2.2 CDF and normal 77
8.3 Hypothesis test 78
8.4 Why tests reject normality 81
8.5 Customized T-test 84
8.6 Discussion 86

Chapter 9 Data screening and analysis 87
9.1 Cheat-proof test among data sets 87
9.2 Convergence test among data sets 94
9.3 Normality test among data sets 102

Chapter10 Discussion 106

Chapter11 Conclusion 110

Reference 112



[1]Bergstra, Jan A., and C. A. Middelburg. "Itu-t recommendation g. 107: The e-model, a computational model for use in transmission planning." (2003).
[2]ITU-T, Recommendation. "P. 862." Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001).
[3]Perlicki, Krzysztof. "Simple analysis of the impact of packet loss and delay on voice transmission quality." Journal of telecommunications and information technology (2002): 53-56.
[4]ITU-T, P. 880: Continuous evaluation of time varying speech quality. 2004.
[5]Beuran, Razvan, and Mihai Ivanovici. User-perceived quality assessment for VoIP applications. No. CERN-OPEN-2004-007. 2004.
[6]Chen, S., Chu, C. Y., Yeh, S. L., Chu, H. H., & Huang, P. (2014). Modeling the qoe of rate changes in SKYPE/SILK VoIP calls. IEEE/ACM Transactions on Networking (TON), 22(6), 1781-1793.
[7]McGraw, Kenneth O., Mark D. Tew, and John E. Williams. "The integrity of Web-delivered experiments: Can you trust the data?." Psychological Science 11.6 (2000): 502-506.
[8]Surowiecki, James. The wisdom of crowds. Anchor, 2005.
[9]Oinas-Kukkonen, Harri. "Network analysis and crowds of people as sources of new organisational knowledge." Knowledge Management: Theoretical Foundation (2008): 173-189.
[10]Yen, Yu-Chuan, et al. "Lab experiment vs. crowdsourcing: a comparative user study on Skype call quality." Proceedings of the 9th Asian Internet Engineering Conference. ACM, 2013.
[11]Mu, Mu, et al. "Statistical analysis of ordinal user opinion scores." Consumer Communications and Networking Conference (CCNC), 2012 IEEE. IEEE, 2012.
[12]Clark, R. A., Podsiadlo, M., Fraser, M., Mayo, C., & King, S. (2007). Statistical analysis of the Blizzard Challenge 2007 listening test results. Proc. BLZ3-2007 (in Proc. SSW6).
[13]Alonso, Omar, Daniel E. Rose, and Benjamin Stewart. "Crowdsourcing for relevance evaluation." ACM SigIR Forum. Vol. 42. No. 2. ACM, 2008.

[14]Reichl, P., Egger, S., Schatz, R., & D''Alconzo, A. (2010, May). The logarithmic nature of QoE and the role of the Weber-Fechner law in QoE assessment. In Communications (ICC), 2010 IEEE International Conference on (pp. 1-5). IEEE.
[15]Fiedler, M., Hossfeld, T., & Tran-Gia, P. (2010). A generic quantitative relationship between quality of experience and quality of service. Network, IEEE, 24(2), 36-41.
[16]Hoßfeld, T., Hock, D., Tran-Gia, P., Tutschku, K., & Fiedler, M. (2008, May). Testing the IQX hypothesis for exponential interdependency between QoS and QoE of voice codecs iLBC and G. 711. In Proceedings of the 18th ITC Specialist Seminar on Quality of Experience (pp. 105-114).
[17]ITU-T, E model, R Value Calculation, URL: http://www.itu.int/ITU-T/studygroups/com12/emodelv1/calcul.php
[18]Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010, April). Are your participants gaming the system?: screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2399-2402). ACM.
[19]Series, B. S. "Methods for Assessor Screening." (2014).
[20]Kotrlik, J. W. K. J. W., & Higgins, C. C. H. C. C. (2001). Organizational research: Determining appropriate sample size in survey research appropriate sample size in survey research. Information technology, learning, and performance journal, 19(1), 43.
[21]Willner, Ozzie. "How to Choose the Proper Sample Size." Technometrics 32.1 (1990): 94-95.
[22]de Boor, Conte. Elementary numerical analysis. McGraw-Hill, 1972.
[23]Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399-402.
[24]Eerola, T., Lensu, L., Kalviainen, H., Kamarainen, J. K., Leisti, T., Nyman, G., ... & Oittinen, P. (2010). Full reference printed image quality: Measurement framework and statistical evaluation. Journal of Imaging Science and Technology, 54(1), 10201-1.
[25]Pitas, C. N., Moraitis, N., Panagopoulos, A. D., & Constantinou, P. (2010). Speech and video quality assessment of GSM and WCDMA rollout mobile radio access networks in a regulated and competitive market. In 9th international conference on measurement of speech, audio and video quality in networks (MESAQIN).
[26]Ipeirotis, P. G. (2010). Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads, The ACM Magazine for Students, 17(2), 16-21.
[27]Raake, A. (2007). Speech quality of VoIP: assessment and prediction. John Wiley & Sons.
[28]Walker, John Q. "Assessing VoIP call quality using the E-model." NetIQ Corporation (2001).
[29]Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis. "Running experiments on amazon mechanical turk." Judgment and Decision making 5.5 (2010): 411-419.
[30]ITU-T Recommendation P.913“Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality televisionin any environment,” 2014.
[31]ITU-T Recommendation P.880“Continuous evaluation of time-varying speech quality,” 2014.
[32]ITU-R Recommendation BT.500-13 “Methodology for the subjective assessment of the quality of television pictures,” 2003.
[33]International Organization for Standardization. Sensory Analysis: General Guidance for the Selection, Training and Monitoring of Assessors. Selected Assessors. International Organization for Standardization, 1993.
[34]Lorho, Gaëtan, Guillaume Le Ray, and Nick Zacharov. "eGauge—a measure of assessor expertise in audio quality evaluations." Audio Engineering Society Conference: 38th International Conference: Sound Quality Evaluation. Audio Engineering Society, 2010.
[35]ITU-T Recommendation BS.1116“Methods for the subjective assessment of small impairments in audio systems,” 2014.
[36]Massey Jr, Frank J. "The Kolmogorov-Smirnov test for goodness of fit." Journal of the American statistical Association 46.253 (1951): 68-78.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top