# Polarity Analysis for Sentiment Classification

## Stop Word List

This stopword list is probably the most widely used stopword list. It covers a wide number of stopwords without getting too aggressive and including too many words which a user might search upon. This wordlist contains only 11 words.

## How To Decide K-top

### Improve

#### Combine Classifier

• LM_POS : if LM.classify(x) = POS, then LM_POS = 1. Otherwise LM_POS = 0
• LM_NEG : if LM.classify(x) = NEG, then LM_NEG = 1. Otherwise LM_POS = 0
• WINNOW_POS : if Winnow.classify(x) = POS, the WINNOW_POS = Winnow.strongClassify(x).
• WINNOW_NEG : if Winnow.classify(x) = NEG, the WINNOW_NEG = Winnow.strongClassify(x).

strongClassify(x) 從 training data 中得到 h(x) 的出現的最大值，然後根據將判斷的函數大小得到，strongClassify(x) = h(x) / TRAINING_MAX_H_VALUE，之所以不直接使用 strongClassify(x) = h(x) 是因為很容易造成 overflow 或者是過度的調整判斷。在實驗結果後，將後者公式調整為前者所使用的。

#### N-grams Score

$\chi^{2}(t, c) = \frac{N \times (AD - CB)^{2} }{(A+C)\times (B + D) \times (A + B) \times (C + D)} \times Weight[t.getSize()] \times Score(t)$

#### Vector

$vector[i] = Score(ngrams(i)) + \sqrt{n-grams(i) \text{ appear times}}$

## extra data support

• AFINN-111.txt
• positive word list (ignore) 毫無幫助
• negative word list (ignore) 毫無幫助
• negation not (ignore) 目前發現只會更糟糕

# 程式撰寫

## N-grams Storing

stringinteger 標記。

## Pseudocode

### Support N-grams Sieve

• AFINN-111.txt
The file AFINN-111.txt contains a list of sentiment scores
• Stop word list
Small set, |S| < 20
• Synonymous “Not” list
unused
• Abbreviation list
Rule, |R| < 10
• No CRF, No Parsing tree, No Subjective filter

# To Do

Training Classifier with 5000 subjective and 5000 objective processed sentences.

http://www.cs.cornell.edu/People/pabo/movie-review-data/