UVa 11855 - Buzzwords

contents

  1. 1. Problem
  2. 2. Input Specification
  3. 3. Sample Input
  4. 4. Output Specification
  5. 5. Output for Sample Input
  6. 6. Solution

Problem

The word the is the most common three-letter word. It even shows up inside other words, such as “other” and “mathematics”. Sometimes it hides, split between two words, such as “not here”. Have you ever wondered what the most common words of lengths other than three are?

Your task is the following. You will be given a text. In this text, find the most common word of length one. If there are multiple such words, any one will do. Then count how many times this most common word appears in the text. If it appears more than once, output how many times it appears. Then repeat the process with words of length 2, 3, and so on, until you reach such a length that there is no longer any repeated word of that length in the text.

Input Specification

The input consists of a sequence of lines. The last line of input is empty and should not be processed. Each line of input other than the last contains at least one but no more than one thousand uppercase letters and spaces. The spaces are irrelevant and should be ignored.

Sample Input

1
2
THE OTHER MATHEMATICS NOT HERE
AA

Note that the last line of the sample input is a blank line.

Output Specification

For each line of input, output a sequence of lines, giving the number of repetitions of words of length 1, 2, 3, and so on. When you reach a length such that there are no repeated words of that length, output one blank line, do not output anything further for that input line, and move on to the next line of input.

Output for Sample Input

1
2
3
4
5
6
7
5
4
4
2
2
2

Solution

題目描述:

依序將長度為 1, 2, 3, 4 … 其重複出現次數最高的字串次數輸出,只考慮重複次數大於 1 的所有情況。

題目解法:

由於題目只查看大寫字母,忽略空白不計,若存在有空白的行時特別處理,否則在隨後的 trim() 會有輸出錯誤。

直接套上 Suffix Array,使用 Double algorithm 建造法,並且建出高度數組。

建造在 O(n log n),但是輸出處理最慘會到 O(n^2)

在其他的做法中,可以看到有人使用 hash function 進行比較 (將每一組單詞丟到同一個欄中,之後在進行比較),這也是個挺不錯的做法,但是風險還是存在。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <algorithm>
using namespace std;
struct SuffixArray {
int sa[1005], h[1005], n;
int w[1005], ta[1005], tb[1005];
char str[1005];
void sort(int *x, int *y, int m) {
static int i;
for(i = 0; i < m; i++)
w[i] = 0;
for(i = 0; i < n; i++)
w[x[y[i]]]++;
for(i = 1; i < m; i++)
w[i] += w[i-1];
for(i = n-1; i >= 0; i--)
sa[--w[x[y[i]]]] = y[i];
}
bool cmp(int *r, int a, int b, int l) {
if(r[a] == r[b]) {
if(a+l >= n || b+l >= n)
return false;
return r[a+l] == r[b+l];
}
return false;
}
void build_h() {
int i, j, k;
for(i = 0; i < n; i++) ta[sa[i]] = i;
for(i = 0; i < n; i++) {
if(ta[i] == 0) {
h[ta[i]] = 0;
continue;
}
if(i == 0 || h[ta[i-1]] <= 1)
k = 0;
else
k = h[ta[i-1]]-1;
while(str[sa[ta[i]-1]+k] == str[sa[ta[i]]+k])
k++;
h[ta[i]] = k;
}
}
void build() {// x: rank, y: second key
int i, k, m = 128, p;
int *x = ta, *y = tb, *z;
n = strlen(str);
x[n] = 0;
for(i = 0; i < n; i++)
x[i] = str[i], y[i] = i;
sort(x, y, m);
for(k = 1, p = 1; p < n; k *= 2, m = p) {
for(p = 0, i = n-k; i < n; i++)
y[p++] = i;
for(i = 0; i < n; i++) {
if(sa[i] >= k) {
y[p++] = sa[i]-k;
}
}
sort(x, y, m);
z = x, x = y, y = z;
for(i = 1, p = 1, x[sa[0]] = 0; i < n; i++)
x[sa[i]] = cmp(y, sa[i-1], sa[i], k) ? p-1 : p++;
}
}
};
int main() {
SuffixArray in;
while(gets(in.str) && in.str[0] != '\0') {
int n = 0;
for(int i = 0; in.str[i]; i++)
if(in.str[i] != ' ')
in.str[n++] = in.str[i];
in.str[n] = '\0';
in.build();
in.build_h();
if(n == 0)
puts("0");
for(int i = 1; i <= in.n; i++) {
int cnt = 0, ret = 0;
for(int j = 0; j < in.n; j++) {
if(in.h[j] >= i)
cnt++;
else
ret = max(ret, cnt), cnt = 0;
}
ret = max(ret, cnt);
if(ret <= 0)
break;
printf("%d\n", ret + 1);
}
puts("");
}
return 0;
}