Company Ghost Story 公司鬼故事 16

新人面談

找人來面試,如果像某些公司只有聊天,恐怕會忽略下面的狀況。

根據碩士研究的主題進行用人判定

  • 研究 GPU 的人不懂圖像渲染模型
  • 研究資安的人不懂 AES
  • 研究編碼壓縮的人不懂進制轉換
  • 研究機器學習的人不懂線性變換

不排除公司薪水福利差。即便是台清交成的學生,也只能找來有限度的研究能力人才。

研究報告

  • 報告描述沒有單位
    • 記憶體單位是什麼?MB?GB?
    • 時間單位是什麼?秒?毫秒?
  • 比較結果沒有參考對象
    • 相較於哪一個版本的?版本號是多少?
    • 使用哪一個環境的?32 位元?64 位元?
  • 文章標題沒有結構性 H1, H2, H3, …
  • 先描述結果,但定義放最後面。
  • 實驗組合結果,項目描述只有數據單位。

不排除公司薪水福利差。即便是台清交成的學生,論文可能都是教授親自寫的、實驗教授做的。

基礎學問

1
2
// pow(2, x) = y
Given y, find a upper integer x, satisfy pow(2, x) <= y.

拿到兩個自然對數的實作 (log(y)/log(2))。滿心好奇卻得到一個說法「浮點數除法沒有誤差」。那多問一句「取自然對數沒有誤差嗎?」我內心浮上答案驗證了事實「沒有」這個要牽扯說學校沒教嗎?某方面來說,道理過不去。

雜湊

雜湊任何操作都是常數 $\mathcal{O}(1)$

  1. 知道什麼是攤銷複雜度嗎?$\text{amortized} \; \mathcal{O}(1)$
  2. 拿到第一個元素是 $\mathcal{O}(1)$
1
2
3
4
HashSet<Integer> hash = new HashSet<>(input);
while (!hash.isEmpty()) {
hash.remove(hash.iterator().next());
}

這都不好說。實作方式千奇百種,上述看起來簡單,在 Java 裡面卻是 $\mathcal{O}(n^2)$

Read More +

Company Ghost Story 公司鬼故事 15

Redundant Debug Message

1
2
3
class Placement {
Image image = new BufferedImage(1024, 768); // unused
}

Naming

1
2
class Colorizer { setDrawDevice(Device), removeDrawDevice(Device)}
// setDisplay(Device, boolean)

Inheritance

1
2
class XXXPanel { Panel getPanel(); }
// why not extends Panel

String Equal

1
2
if (input.contains("yes"))
// ???, equals???

Key in Condition

1
2
3
4
5
6
7
void drawTick(Color axisColor) {
if (color.equals(Color.RED))
dx++;
else
dy++;
}
// Color as key? why not pass enum Axis

Out-of-Knowledge in Library

1
2
3
4
class AbstractTableViewRenderer {
EDA_DIE_ONLY_RENDER dieRender;
}
// don't put your shit on UI library

Over-overloading for Lambda

1
2
3
4
5
6
class AbstractTableView {
public void with(MouseInterface)
public void with(TextInterface)
public void with(IconInterface)
}
// overloading is not for heterogeneous

Oversize

1
2
long degree; // [0, 360)
// oooooooooooooover-size

Missing Else

1
2
3
4
if (geom instanceof Circle) { renderAsBall((Circle) geom); }
else if (geom instanceof Oblong) { if (type == ball) renderAsCylinder(geom);} // new feature
else renderAsPoly(geom.toPoly());
// fuckup, default as polygon, where is your else
1
2
3
4
setProperty(Object dbo, String name, Object val) {
if ("group".equals(name)) Group.set(dbo, val);
}
// where is your else and exception

OK, then Action

1
2
3
4
buttonOK.setOnAction(e -> apply());
void apply() {...; rebuildModel();}
slider.setOnChange(e -> rebuildModel()); // new feature
// our performance is gone by you

Listener

1
2
3
4
void setDisplay(...) {
fireProperytChange("SetDisplay");
}
// ... property is display, not the method name

Time-consuming Message

1
2
Object.requireNotNull(number, "The number is not illegal by " format + " with" + ... context);
// don't compose the error message, construct it on demand.

Time-consuming Render

1
2
3
4
5
class ListRenderer {
void render(ListItem e) {
ImageIO.read(getImagePath(e.getValue().getType()));
}
}

Crazy UI Design

1
2
3
4
5
6
7
|------------------|
| Chooser |
+------------------+
| [ComboBox] | # 10^6 objects
| [Cancel] [OK] |
+------------------+
# tell me, how do you pick
Read More +

Company Ghost Story 公司鬼故事 14

Java

Format vs. Concatenation

1
String.format("format " + a + "%s", b);

大部分情況,很少去生成 format string,
混合使用很容易遇到表示錯誤,因為當 a 中出現 % 等特殊字元時,解析上會發生錯誤。最好是統一一種使用規則。

Comparator Signum

1
2
3
4
5
long a = func(x);
long b = func(y);
long dir = mDir;
return (int) a - (int) b; // SHOULD Long.compare(a, b);
return (int) (a - b) * dir; // SHOULD Long.compare(a, b) * Long.signum(dir);

隨意 casting 三不五時就會 arithmetic overflow 搞砸了排序。

What’s Not-Equal

1
2
3
4
5
6
if ((a == null && b != null) ||
(a != null && b == null) ||
!a.equals(b)) {
}
// WAIT, if a == null and b == null ?
// SHOULD Objects.equals(a, b);

有內建方便使用的函數,別再這樣寫了。

Sync-Up

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Record {
boolean merge() {
return a.setParent(b.getParent().copy());
// refactor, a.setParent(b.getParent());
}
}
class Merger {
void process(Record r) {
try {
merge();
} catch (Exception e) {
// nothing here
}
}
}

為什麼偷偷搞了一個 Exception 又不給 log message,一不小心就 refactor 到奇怪的東西,曾經把 NullPointerException 視為一個正常邏輯,修掉來自另一個 call path 的 NPE,卻又踩了另一塊的雷。

Keyboard

1
2
3
4
SHIFT + INSERT = CTRL + V
CTRL + C
CTRL + Z + ENTER
HOME/END

大家耳熟能詳的 CTRL C+V,都不知道 CTRL+INSERT/SHIFT+INSERT,而在 X Window System 上,選取時會自動複製,不用額外按下複製操作。

至於問 INSERT/DELETE/HOME/END 是什麼的小夥伴,還是好好看清楚鍵盤吧。

Read More +

Shiny People 感謝

背景原因

公司有一套 Shiny 機制,可以在特別日期或事件後,來表揚合作不錯的長官同事。有時候表揚可以帶鼓勵金的,當然這筆錢不會由您支付,而是由部門的某一個錢包裡出,理所當然得不可能存在同部門互相洗錢的行為,仍然遵循某一種流程。

詳細信件內容有點不同了,就當作有朝一日會在網路上搜尋到吧。

任職周年慶祝

Software Architect

To Jeffrey Morehouse:

I really appreciate your great art, which facilitates high productivity compared to other products. In these years, give the opportunity to develop my improvement thought in core. During coding experiment, your resource and past experience help me a lots. Although we’re working on different ways these years, I’m looking forward to having technical discussion with you one day. Congratulate to Techholic!

感謝您轉交下來的偉大傑作,讓我們產品相較於其他產品
擁有非常高的生產力、極低的開發週期。同時在這幾年間,
給予我機會去實踐自己的想法。並在開發過程中,給不少
資源與過去經驗參考,這些都讓我學了很多。

雖然這幾年間沒有共事,期待未來可以分享這些技術細節。
為我們的技術狂熱慶祝!

Manager

To Thunder Lay:

I’m glad to be the early employee in your part. The unfailingly patient leader is very important to me in first job. Even though we’re good at different profession, we learned from each other to complish great big tasks these years. That is very good experience, and rare in many kinds of job career.

很高興地成為早期的部門成員,對我第一份工作而言,
您的耐心指導成為很大的助力,雖擅長不同領域的專長,
在這幾年間,取長補短地完成了許多重大任務,這真的
是在其他職業生涯中可未必能見到的非常好的經驗。

結語

寫完英文,看著自己的譯文,赫然發現我的中文死去,找些朋友提前問個意見後,只得到了一致的答覆「看不懂」。在沒有前因後果,不具備相同處理的條件下,看起來可能是誰要離職了吧。

Read More +

Company Ghost Story 公司鬼故事 13

Java

Default Constructor Trap

1
2
3
4
5
6
class Point {
Point() {
throw new UnsupportedOperationException(); // why?
}
Point(int x, int y) {...}
}

一開始不宣告就好了,當有自定義的建構子,預設建構子自然無法被使用,沒必要設一個陷阱讓別人踩。

Feature and Backward Compatibility

1
2
3
4
5
6
class Engine {
private String mFeature = ""; // Why not null ?
}
class DbObj {
private String mFeature = ""; // Why not null ?
}

空與空字串的意義不同,預設為空字串 ("") 的狀況並不多見,為空 (null) 在序列化和反序列化時的差異就很大了。

Fancy Check-and-Cast

1
2
3
Shape shp;
if (shp.getUserName().equals("circle")) // wait, where is your `instanceof`
return ((Circle) shp).getBound();

Call-By-Value

1
2
3
4
5
void readColRow(int col, int row) {
Pair p = readPair();
col = p.first;
row = p.second;
}

Java 沒有 reference type 的這操作。

Slow Feature

1
2
3
void exec() {
for (;;) {} // where is throw new UnsupportedOperationException();
}

我可不想跟客戶說「你再等等,這功能只是比較慢」的謊言。

Read More +

Company Ghost Story 公司鬼故事 12

Java

Fancy OOP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class MSTAlgorithmEngine {
JButton mApplyButton; // why, ???
JPanel mUI; // why, ???
}
class DiffRecord {
Action mMergeAction; // why, ???
Action mShowAction; // why, ???
}
class Rectangle {
void normalize() {
new NormalizeAction().actionPerformed(new ActionEvent(ActionEvent.PERFORMED));
} // I cannot deal with this shit anymore.
class NormalizeAction extends AbstractAction {
// real implementation for model class
}
}

資料歸資料,算法歸算法,別把 UI 牽扯進來。

Shit Cache

1
2
3
4
5
6
7
8
class Engine {
private static CacheStructure sCache;
public void exec() {
resetCache();
// fill cache;
} // but, when will you clear sCache, sCache = null?
}

快取的污染啊,那些殘存的資料依舊卡在記憶體內,是無法被 GC 回收的。反而,造成下一個運行的更容易出現記憶體不足。

Shit Cache in Class

1
2
3
4
5
6
7
8
9
10
11
class Dbo extends DbObject {
private CacheStructure mCache; // for internal recursion, why this thing is here?
public Object getSomething() {
// init cache
return internalGetSomething(parameters);
} // but, when will you clear cache, cache = null?
private Object internalGetSomething(parameters) {
// ...
}
}

別亂放你的資料結構,這關資料庫物件什麼事?

Read More +

Java JNI GC Thread Error (EINVAL)

Keywords: JNI, Java, GC Thread, add_workers, static library, Thread Error

Solution: -XX:+AdjustStackSizeForTLS from JDK 14

Description

We cannot launch GC thread of JVM from JNI. And, it shows

1
[os,thread] Failed to start thread - pthread_create failed (EINVAL) for attributes: stacksize: 1024k, guardsize: 0k, detached.

The main reason is the glibc bugs for native C code issues. For example, we allocate thread-local variables by static library linking.

1
2
3
__thread int buf[1048576];
thread_local int buf[1048576];
#pragma omp threadprive(buf);

And then, glibc 2.12 will move them on stack, and require minimum stack size at least 1024k bytes. But, the default configuration JVM of JDK11 will minimum stacksize to be 1024k bytes as default in Linux OS. Therefore, the pthread_create throws the error EINVAL for specific stacksize setting.

Options

https://www.oracle.com/java/technologies/javase/14all-relnotes.html

The glibc library allocates some thread-local storage (TLS) in the stack of a newly created thread, leaving less stack than requested for the thread to do its work. This is particularly a problem for threads with small stack sizes. It is an inherited issue from a well-known glibc problem, ‘Program with large TLS segments fail’ [0] and has been observed in Java applications. In one of the reported JVM failure instances, the issue manifests as a StackOverflowError on the process reaper thread, which has a small stack size. The java.lang.Thread constructor enables users to specify the stack size for a new thread. The created thread may encounter the TLS problem when the specified size is too small to accommodate the on-stack TLS blocks.

In JDK 8, a system property, jdk.lang.processReaperUseDefaultStackSize, was introduced to address the TLS issue only for reaper threads. Setting the property gives a bigger stack size to the reaper threads.

To address the issue for all threads, a general purpose workaround was implemented in Java which adjusts thread stack size for TLS. It can be enabled by using the AdjustStackSizeForTLS command-line option:

When creating a new thread, if AdjustStackSizeForTLS is true, the static TLS area size is added to the user requested stack size. AdjustStackSizeForTLS is disabled by default.

Conclusion

整合進大型程式時,要多觀察靜態編譯的部分,如果有一些 TLS 的宣告,則會增加其他預設執行緒的最小使用堆疊大小。而 JVM 又傾向使用最少資源去完成,則很容易在計算最小資源時發生問題。

如果去查閱 JDK15 的 代碼 os_linux.cpp 時,有一段到動態鏈結函數庫 glibc 的詢問最低需求來解決此問題。這一段牽涉到 hack 技巧,偷偷利用 symbol 爬出私有函數,不算正常操作行為。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// On Linux, glibc places static TLS blocks (for __thread variables) on
// the thread stack. This decreases the stack size actually available
// to threads.
//
// For large static TLS sizes, this may cause threads to malfunction due
// to insufficient stack space. This is a well-known issue in glibc:
// http://sourceware.org/bugzilla/show_bug.cgi?id=11787.
//
// As a workaround, we call a private but assumed-stable glibc function,
// __pthread_get_minstack() to obtain the minstack size and derive the
// static TLS size from it. We then increase the user requested stack
// size by this TLS size.
//
// Due to compatibility concerns, this size adjustment is opt-in and
// controlled via AdjustStackSizeForTLS.
typedef size_t (*GetMinStack)(const pthread_attr_t *attr);
GetMinStack _get_minstack_func = NULL;
static void get_minstack_init() {
_get_minstack_func =
(GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack");
log_info(os, thread)("Lookup of __pthread_get_minstack %s",
_get_minstack_func == NULL ? "failed" : "succeeded");
}
// Returns the size of the static TLS area glibc puts on thread stacks.
// The value is cached on first use, which occurs when the first thread
// is created during VM initialization.
static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
size_t tls_size = 0;
if (_get_minstack_func != NULL) {
// Obtain the pthread minstack size by calling __pthread_get_minstack.
size_t minstack_size = _get_minstack_func(attr);
// Remove non-TLS area size included in minstack size returned
// by __pthread_get_minstack() to get the static TLS size.
// In glibc before 2.27, minstack size includes guard_size.
// In glibc 2.27 and later, guard_size is automatically added
// to the stack size by pthread_create and is no longer included
// in minstack size. In both cases, the guard_size is taken into
// account, so there is no need to adjust the result for that.
//
// Although __pthread_get_minstack() is a private glibc function,
// it is expected to have a stable behavior across future glibc
// versions while glibc still allocates the static TLS blocks off
// the stack. Following is glibc 2.28 __pthread_get_minstack():
//
// size_t
// __pthread_get_minstack (const pthread_attr_t *attr)
// {
// return GLRO(dl_pagesize) + __static_tls_size + PTHREAD_STACK_MIN;
// }
//
//
// The following 'minstack_size > os::vm_page_size() + PTHREAD_STACK_MIN'
// if check is done for precaution.
if (minstack_size > (size_t)os::vm_page_size() + PTHREAD_STACK_MIN) {
tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN;
}
}
log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT,
tls_size);
return tls_size;
}
Read More +

Company Ghost Story 公司鬼故事 11

Java

Over Overload

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class OhMyGod {
public void run() {System.out.println("A");}
public void run(String t) {System.out.println("B");}
public <T> void run(T[] t) {System.out.println("C");}
public <T> void run(T t) {System.out.println("D");}
public static void run(String... t) {System.out.println("E");}
public static void main(String[] args) {
new OhMyGod().run();
new OhMyGod().run(new String[0]);
new OhMyGod().run("1");
new OhMyGod().run("1", "2");
}
} // AEBE, little strange

One-Format Multiple-Ways

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class FileIn {
private boolean isCSV; // why not enum
private boolean isXML;
private boolean isJSON;
public void setIsCSV(boolean e) {isCSV = e;}
public void setIsXML(boolean e) {isXML = e;}
public void setIsJSON(boolean e) {isJSON = e;}
public void import(File file) {
// why not if-else
if (isCSV) CSVIn(file); //???
if (isXML) XMLIn(file); //???
if (isJSON) JSONIn(file); //???
}
}
Read More +

Company Ghost Story 公司鬼故事 10

Java

Compare Function

1
2
3
4
5
6
7
Collections.sort(arr, (x, y) -> (int) (x.cost - y.cost));
Collections.sort(arr, (x, y) -> Long.valueOf(x).compareTo(y));
Collections.sort(arr, (x, y) -> (int) Math.round(x.cost - y.cost));
// why not use official methods?
Collections.sort(arr, (x, y) -> Double.compare(x.cost, y.cost));
Collections.sort(arr, (x, y) -> Long.compare(x, y));

別鬧了,會 overflow/underflow,數量一大,每一個排序都在丟例外給我啊。

Partition Function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Optimizer {
private int avgX(ArrayList<Point> pts) {
ArrayList<Long> x = new ArrayList<>(pts.size());
for (Point p : pts)
x.add(p.x);
return avg(x);
}
private int avg(ArrayList<Long> xs) {
long sum = 0;
for (Long x : xs)
sum += x;
return sum / xs.size();
}
}
// ... why not merge avg() into avgX()?

每次都要額外建一個陣列,而且這個函數只有被內部的單一函數呼叫,函數切這麼細,但是效能卻爛掉。

Try-Catch vs. If-Else

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Parser {
void addChar(char c) {
try {
que[index] = c;
} catch (ArrayIndexOutOfBoundsException e) {
doubleArray();
que[index] = c;
}
index++;
}
}
// why not just write if-else statements, it's really hard to maintain your code.
class Parser {
void addChar(char c) {
if (index >= que.length)
doubleArray(index);
que[index] = c;
index++;
}
}

如果在修改這幾段代碼的時候,還得經常考慮例外處理是怎麼做的,這寫法真的太奇葩。要是開剖析器去檢查,會經常看到例外出現,那到底是好事還壞事?

Read More +

Java 優化內存使用 組合篇

整數映射 (Integer Mapping)

算法經常會使用整數映射到物件,而整數範圍就會影響到內存。為了技術性能 (capability),直接宣告

1
2
Map<Long, Object> a;
TLongObjectMap<Object> b;

以應付未來變化。考慮到鍵值欄位,實作上建立 $\mathcal{O}(n)$ long[]。在初始狀態下,大多情境只使用 int[] 的情況。為此,我們考慮分流使用

1
2
3
4
class MMap<T> {
TLongObjectMap<T> longMap;
TIntObjectMap<T> intMap;
}

使用這樣子的概念,通用狀態可以降低 $50\%$ 內存,均勻分布情況則可以降低 $25\%$

陣列替代 (Array Substitute)

前一篇,我們提到了將 ArrayList 換成最原生的 T[]。更進一步,以前寫 C 的時候,壓根不在意 T[].length 這一個需求,因為題目已經寫死在某個全區變數裡。可在 Java 裡頭,我們宣告不出純粹的陣列,只能硬生生多出 $4 \; \text{bytes}$

1
2
3
4
class Scheme {
K[] keys; // as known, keys.length = 10
V[] vals; // as known, vals.length = 10
}

換個方向思考,那朝著減少試試。進行陣列雙向合併

1
2
3
class Scheme {
Object[] keyval; // [key0, key1, ..., val1, val0]
}

Scheme 物件中,我們減少了 $8 \; \text{bytes}$ 的額外資訊。但在使用時, casting 是個開銷。

欄位轉移 (Field Translation)

1
2
3
Map<T, String> map;
map.put(a, "MARKED");
map.put(b, "UNMARKED");

當我們還沒有仔細規劃結構時,直接用 has-A 的精神去寫第一步。回過頭來,就會造成不少零散的映射出現。如果映射使用雜湊實作,而不是樹的話,整體的記憶體效率會低個 $33\%$,因為無法塞滿所有的 hash table 中的桶子。

1
2
3
4
class T {
...
Object userData;
}

那為了提升記憶體效率,先檢查是不是所有的項目,幾乎都在這個映射裡面。如果是的話,將這個欄位轉移到目標類別上。看起來很不美觀,但進一步的資料佈局是很可觀的,甚至可以比雜湊 $\mathcal{O}(1)$ 更快的存取。

卸載策略 (Offloading Strategy)

在平行處理中,我們可以劃分成不同數量級的策略。在小規模下,嘗試先以 CPU 計算,若發現更大規模,將資料卸載給其他異質計算單元來完成。同樣的道理,對於資料結構也能更近一步地劃分。

如集合實作:

1
2
3
class MSet {
Object mRaw; // size < 8: LinkedList; otherwise, HashSet
}

由於 Java 中的 HashSet<T> 並不是單一欄位的實作,實值為 HashMap<T, T>,相較於 C/C++ 的狀態,空間多了兩、三成。我們可以簡單使用 LinkedList 取代 HashSet,並且避免小規模數據中未能填滿 hash table 的狀態。在元素個數為一時,串列的記憶體效率 $100 \%$,但雜湊集合只有 $12.5 \%$。換句話說,有時候數據分布很極端,內存用量就差了十倍。

Read More +