diff --git a/.gitbook/assets/Ex4-1.png b/.gitbook/assets/Ex4-1.png new file mode 100644 index 0000000..30e0570 Binary files /dev/null and b/.gitbook/assets/Ex4-1.png differ diff --git a/.gitbook/assets/Ex4-2.png b/.gitbook/assets/Ex4-2.png new file mode 100644 index 0000000..1faa152 Binary files /dev/null and b/.gitbook/assets/Ex4-2.png differ diff --git a/.gitbook/assets/Ex4-3.png b/.gitbook/assets/Ex4-3.png new file mode 100644 index 0000000..9d35d90 Binary files /dev/null and b/.gitbook/assets/Ex4-3.png differ diff --git a/.gitbook/assets/Ex4-4.png b/.gitbook/assets/Ex4-4.png new file mode 100644 index 0000000..454f37a Binary files /dev/null and b/.gitbook/assets/Ex4-4.png differ diff --git a/.gitbook/assets/Ex4-5.png b/.gitbook/assets/Ex4-5.png new file mode 100644 index 0000000..46d1736 Binary files /dev/null and b/.gitbook/assets/Ex4-5.png differ diff --git a/.gitbook/assets/Ex4-6.png b/.gitbook/assets/Ex4-6.png new file mode 100644 index 0000000..8ba6858 Binary files /dev/null and b/.gitbook/assets/Ex4-6.png differ diff --git a/README.md b/README.md index 93576a5..c933363 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,20 @@ Getting started with static program analysis. Read this and start writing your f 静态程序分析入门。阅读此书并着手编写你的第一个静态程序分析器吧! +## 我想听到你的声音 + +- **批评的意见很有价值。** 这是我第一次书写教程,一定有很多做得不好的地方。如果你觉得我写得不好,可以选择提issue或者通过邮箱联系我(ranger.nju#gmail.com)。 +- 如果你觉得我写得不错,可以到GitHub仓库中给我一个Star,也可以在自己的社交圈子中宣传,让更多的人了解这个项目。 + + +## 更新记录与里程碑事件记录 + +1. Oct, 2020. 设立Github Repo +2. 16th, Oct. 第一次得到Star,第一次被Fork +3. 29th, Oct . 更新第七课《过程间分析》至第四章 +4. 30th, Oct. 处理第一次PR,更新文档结构 + + # 这一《静态程序分析》教程对谁有用? 学生,开发者,研究者……几乎所有当代生活者都能从中受益。 @@ -54,7 +68,7 @@ Getting started with static program analysis. Read this and start writing your f - 空指针引用与内存泄漏等:几乎每个程序编写者都被这两个问题所困扰过 2. 提高程序安全性 - Private information leak, injection attack, etc. - - 隐私信息泄漏:TODO + - 隐私信息泄漏:这一问题在移动应用中较为普遍,如果你感兴趣,可以参考[这篇论文](https://www.ieee-security.org/TC/SP2012/posters/ScanDal.pdf)。 - [注入攻击](https://en.wikipedia.org/wiki/Code_injection):这是网络安全中非常常见的议题。不熟悉的读者可以查看[W3School](https://www.w3schools.com/sql/sql_injection.asp)或[Wiki](https://en.wikipedia.org/wiki/SQL_injection)上关于SQL注入攻击的例子。 3. 为编译优化提供基础技术 - Dead code elimination, code motion, etc. diff --git a/SUMMARY.md b/SUMMARY.md index 73af91d..4d49c6b 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -8,13 +8,7 @@ * [初见——静态分析是什么?](ch1/01-01-whats-spa.md) * [中间表示——静态分析器的输入](ch1/01-02-ir.md) * [第二章-数据流分析——应用](ch2/README.md) - * [数据流分析](ch2/02-00-dataflow-analysis.md) - * [到达定值分析](ch2/02-01-reaching-def-analysis.md) - * [活跃变量分析](ch2/02-02-live-var-analysis.md) - * [可用表达式分析](ch2/02-03-available-exp-analysis.md) * [第三章-数据流分析——理论](ch3/README.md) - * [数据流分析](ch3/03-00-dataflow-analysis.md) - * [到达定值分析](ch3/03-01-reaching-def-analysis.md) - * [活跃变量分析](ch3/03-02-live-var-analysis.md) - * [可用表达式分析](ch3/03-03-available-exp-analysis.md) +* [第四章-过程间分析](ch4/README.md) + * [过程间分析简介](ch4/04-01-inter-analysis-spa.md) diff --git a/ch1/01-01-whats-spa.md b/ch1/01-01-whats-spa.md index c5751b1..f7d7194 100644 --- a/ch1/01-01-whats-spa.md +++ b/ch1/01-01-whats-spa.md @@ -1,5 +1,7 @@ # 初见——静态分析是什么? +**WARNING:即将进入施工未完毕区域。** + > 静态程序分析是指**不编译**出二进制代码通过测试用例对程序进行测试,仅通过**静态地**分析程序得到程序**不平凡**的性质的过程。 ## 静态程序分析的抽象定义与诠释 @@ -138,3 +140,11 @@ complete: 报出的问题都是对的 must analysis: outputs information that mu ### 关于判定问题中经常用到的术语 +在计算机的判定性问题中和各种医学诊断中,都会提到这几种概念: +1. True positive +2. True negative +3. False positive +4. False negative + +[Youtube上](https://www.youtube.com/watch?v=rFwu69tuMiU)有一个简单的介绍视频。 + diff --git a/ch1/01-02-ir.md b/ch1/01-02-ir.md index 3dce6e3..e55cd1d 100644 --- a/ch1/01-02-ir.md +++ b/ch1/01-02-ir.md @@ -1,5 +1,13 @@ # 中间表示——静态分析器的输入 +如果你学习过编译原理课程,能理解: + +> 静态分析器的输入是线型IR(三地址码3AC),而非树型IR(抽象语法树AST) + +可以直接跳过本小节。 + +**WARNING:即将进入施工未完毕区域。** + ## 从编译器的组件谈起 一个典型的编译器分成一下几个部分: diff --git a/ch1/README.md b/ch1/README.md index 2e226ec..a41b4db 100644 --- a/ch1/README.md +++ b/ch1/README.md @@ -2,8 +2,14 @@ 在这一部分中,将正式地介绍: -- 什么是静态程序分析(下简称为静态分析) -- 这一技术有什么样的应用 -- 为什么它值得我们去学习与研究 -- +- 什么是静态程序分析(下简称为静态分析)? +- 如何设计一个实用的静态程序分析器? +**注:第一章到第三章在B站上有相应的视频。不建议读者现在阅读一到三章的内容。在作者腾出时间整理文稿之前,建议读者先到B站观看相应的视频。** + +对应的视频在这里: +- [第二课-IR](https://www.bilibili.com/video/BV1zE411s77Z) + +从[这里](https://ranger-nju.gitbook.io/static-program-analysis-book/ch4)直接跳转到施工完毕区域(第四章)。 + +**WARNING:即将进入施工未完毕区域。** \ No newline at end of file diff --git a/ch2/README.md b/ch2/README.md index 5457a2b..a47469c 100644 --- a/ch2/README.md +++ b/ch2/README.md @@ -1,2 +1,11 @@ # 第二章-数据流分析的理论与应用 +**注:第一章到第三章在B站上有相应的视频。不建议读者现在阅读一到三章的内容。在作者腾出时间整理文稿之前,建议读者先到B站观看相应的视频。** + +对应的视频在这里: +- [第三课-数据流分析一](https://www.bilibili.com/video/BV1oE411K79d) +- [第四课-数据流分析二](https://www.bilibili.com/video/BV19741197zA) + +从[这里](https://ranger-nju.gitbook.io/static-program-analysis-book/ch4)直接跳转到施工完毕区域(第四章)。 + +**WARNING:即将进入施工未完毕区域。** \ No newline at end of file diff --git a/ch3/README.md b/ch3/README.md index 2b612e3..a0c59d7 100644 --- a/ch3/README.md +++ b/ch3/README.md @@ -2,180 +2,12 @@ 帮助理解之前所数的数据流分析技术,让你在将来review的时候更有效率。忘记了也能自己推导出大概。 -## 基础(Recall离散数学) +**注:第一章到第三章在B站上有相应的视频。不建议读者现在阅读一到三章的内容。在作者腾出时间整理文稿之前,建议读者先到B站观看相应的视频。** -### Yet Another(Math Instead of Program) View to Iterative Algorithm +对应的课程在这里: +- [第五课-数据流分析理论一](https://www.bilibili.com/video/BV1A741117it) +- [第六课-数据流分析理论二](https://www.bilibili.com/video/BV1964y1M7nL) -Forward:根据IN求OUT -May:Merge时求并,初始化为bottom +从[这里](https://ranger-nju.gitbook.io/static-program-analysis-book/ch4)直接跳转到施工完毕区域(第四章)。 -文字描述->符号化描述。 - -*下标:Node,上标:迭代次数。最后i和i+1的结果一致。* - -右下角引出不动点(数学定义)。 - -- 有解性(一定有解) -- 解的唯一性(假设取得了最好的) - - 最大最小不动点? -- 什么时候算法给出解? - -**这是数十年来程序分析的问题精华。接下来我们接触一些必要的数学。** - -偏序: (能够容忍不可比较性) - -用例子记忆。 - -poset(偏序集) - -- 自反 -- 反对称 -- 传递 - -Lattice(格)之图。 - - -正式介绍Lattice之前,介绍两个概念:上下界。 - -注意:*上下界不需要是集合S中的元素。* - -例子。 - -定义一个least upper bound(lub or join),即最小上界。类似地定义最大下界。 - -例子。 - -属性: -1. join和meet不一定有。 -2. 有则唯一。( 证明-反证法。) - -定义Lattice: 任取两个元素,都有join和meet就是lattice。 - -定义Semilattice:join和meet有且只有一个。 - -定义Complete Lattice:任何一个子集都有join和meet。 - -top和bottom。 - -性质:有穷则complete,反之不行(Recall Ex1) - -程序通常是有限的(表达式,变量等是有限的)。因此,在数据流分析中主要关注Complete Lattice。 - -定义Product Lattice:新的集合,新的关系和join和meet。 - -性质: - -1. A product lattice is a lattice -2. If a product lattice L is a product of complete lattices, then L is also complete - -Lattice上数据流分析的框架: - -例子。 - -总结:Data flow analysis can be seen as iteratively applying transfer functions and meet/join operations on the values of a lattice. - -基础已备。回顾开头的三个问题: - -回顾单调性(和高中数学定义一致)。 - -**不动点定理** - -前提: Complete Lattice/Function is Monotonic/Lattice is Finite. - -效果:给出了固定的求不动点的方法。 - -证明: -1. 存在 -2. 最小 - -存在-f(bottom)中的元素一定在L中,由bottom的定义,这个式子自然成立。 *微积分:单调有界必收敛。* - -最小-从bottom开始和从任意一个x开始都满足关系。 - -#### 喜闻乐见的五块钱 - -从PL与数学的证明(T的不同方向)讲起。 - -通过清晰地定义问题的相关因素,缩小问题。 - -1. 重新定义问题的Scope(上下文敏感指针分析可以跑得必非上下文敏感指针分析更快)。 -2. 科研任务/工作任务需要通过沟通缩小Scope。 - -### Relate Iterative Algorithm to Fixed-Point Theorem - -上节课的Product Lattice。 - -Transfer Function**应该被设计为**是单调的。 -join和meet都可以被证明是单调的。 - -回答问题: -1. Yes。因为不动点原理。 -2. Yes。因为推理出来... - -何时能达到不动点? -首先定义高度。 - -提示:每个node一次一步,一个node最坏情况下走h步。所有node最坏情况下要走$h*k$步。 - -所以算法快慢与程序的规模和Lattice Domain有关。 - -### May and Must Analyses, a Lattice View - -相当于总结上文。 -TODO:从前三节课程的理解到一张图解释 - -所有分析过程一定从不安全的结果向安全但没有意义的结果推进, -以reaching definition为例子,每个块被初始化为全0,代表没有definition可以reach到特定程序点。 -Truth的位置在Safe和Unsafe中间,越接近Safe,精度就越低。 - -问题:为什么一定能越过Truth达到Fixed Point呢? -是否Safe是由Transfer Function和Mering的策略决定的,也就是由设计算法的人决定的。 - - - -### Meet/Join-Over-All-Paths Solution(MOP) - -> 我们的结论有多准? - -Meet:一旦数据流汇聚了,如何处理? - -PL基础小知识: -程序中的汇聚从何而来?<-分支和循环结构 -- 分支:if,switch,try catch, exception, promise(JS) ... -- 循环:while, do while, for ... - -这是分析精度的标杆。 - -### Iterative Algorithm vs MOP - -IA:在每次第一次汇聚的时候就等一等,先merge。不需要枚举,运算量更小,结果没那么准。 -MOP:没必要等,在最后一次汇聚的时候才merge。因为要枚举Path,运算量更大,结果更准。 - -关于精度,有一个简单的证明。(前提是F满足单调性)。 - -如果F满足分配律(高中数学)。那么MOP和我们的IA一样准。 - -好消息!当join/meet使用set union/intersection时(之前举过的三个例子都是可分配的)。 - -### 单调性与不动点 - -### 函数的可分布性 - -### Constant Propagation - -> Given a variable x at program point p, determine whether x is guaranteed to hold a constant value at p. 在程序点P指定一个变量X,判断X是否在这点是一个常量。 - -Undefine->Constant->NotAConstant - -下划线表示PL领域的通配符。 - - -other -> 两个UNDEF;一个UNDEF一个Constant=>不能是NAC:例如x+y,x第一次是UNDEF,第二次是14,y一直都是2,则两次的结果一次是NAC一次是常量=>这个Transfer Funtion不满足单调性。 - -## Worklist Algorithm - -作为IA的优化版本,懂了IA之后WA很容易懂。 - -IA适合性质的分析与证明,工作中使用的往往是WA。 - -WA为什么快->IA为什么慢-> \ No newline at end of file +**WARNING:即将进入施工未完毕区域。** \ No newline at end of file diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029224301293.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029224301293.png new file mode 100644 index 0000000..8ba6858 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029224301293.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029224401395.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029224401395.png new file mode 100644 index 0000000..5f35564 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029224401395.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029224437136.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029224437136.png new file mode 100644 index 0000000..9515338 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029224437136.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029224506176.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029224506176.png new file mode 100644 index 0000000..67794a9 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029224506176.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029224736803.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029224736803.png new file mode 100644 index 0000000..f86f773 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029224736803.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029224820647.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029224820647.png new file mode 100644 index 0000000..fcc9e58 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029224820647.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029224941882.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029224941882.png new file mode 100644 index 0000000..f9c13c8 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029224941882.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029225106724.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029225106724.png new file mode 100644 index 0000000..f331a46 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029225106724.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029225304889.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029225304889.png new file mode 100644 index 0000000..5d6a4cd Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029225304889.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029225350619.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029225350619.png new file mode 100644 index 0000000..8b3d391 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029225350619.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029230138054.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029230138054.png new file mode 100644 index 0000000..a2c6004 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029230138054.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029230224316.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029230224316.png new file mode 100644 index 0000000..f64ef47 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029230224316.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029230504891.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029230504891.png new file mode 100644 index 0000000..96e828a Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029230504891.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029230535984.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029230535984.png new file mode 100644 index 0000000..c650864 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029230535984.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029230622120.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029230622120.png new file mode 100644 index 0000000..a68c5a3 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029230622120.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029230909895.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029230909895.png new file mode 100644 index 0000000..86df7dd Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029230909895.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231106891.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231106891.png new file mode 100644 index 0000000..2de175f Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231106891.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231132412.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231132412.png new file mode 100644 index 0000000..3883411 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231132412.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231155238.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231155238.png new file mode 100644 index 0000000..d79821f Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231155238.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231304304.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231304304.png new file mode 100644 index 0000000..03ff1a6 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231304304.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231543567.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231543567.png new file mode 100644 index 0000000..45eb281 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231543567.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231611608.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231611608.png new file mode 100644 index 0000000..ed02778 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231611608.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231706834.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231706834.png new file mode 100644 index 0000000..cc0c1ff Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231706834.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231905890.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231905890.png new file mode 100644 index 0000000..b8ea304 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231905890.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231908883.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231908883.png new file mode 100644 index 0000000..b8ea304 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231908883.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231936719.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231936719.png new file mode 100644 index 0000000..dac7873 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231936719.png differ diff --git a/ch4/04-01-inter-analysis-spa.assets/image-20201029231952670.png b/ch4/04-01-inter-analysis-spa.assets/image-20201029231952670.png new file mode 100644 index 0000000..3b1ba16 Binary files /dev/null and b/ch4/04-01-inter-analysis-spa.assets/image-20201029231952670.png differ diff --git a/ch4/04-01-inter-analysis-spa.md b/ch4/04-01-inter-analysis-spa.md new file mode 100644 index 0000000..f6060a9 --- /dev/null +++ b/ch4/04-01-inter-analysis-spa.md @@ -0,0 +1,268 @@ +# 过程间分析简介 + +本小节通过四个部分介绍过程间分析。 + +1. Motivation + - **为什么**要引入过程间分析? +2. Call Graph Construction (CHA) + - 介绍一个过程间分析**必要的数据结构Call Graph** + - 当前有数种方法来**构建Call Graph**,本节介绍其中**速度最快的一种(Class hierarchy analysis,简称CHA)** +3. Interprocedural Control-Flow Graph + - 之前的章节关注CFG,引入过程间分析后,我们向CFG中**添加相应的元素**,得到过程间的控制流图(ICFG) + - 讨论由于添加了新元素而需要**增加的操作** +4. Interprocedural Data-Flow Analysis + - 通过一个例子(也就是实验一中做的常量传播分析)来**总结**过程间分析。 + +# Motivation + +之前的章节中都没有考虑方法调用,然而在实际的程序中方法调用非常常见,那么我们如何分析带方法调用的程序呢?最简单的处理方式是:做最保守的假设,即**为函数调用返回NAC**。而这种情况会**丢失精度**。**引入过程间分析能够提高精度。**如果使用最简单的处理方式,下图中的n和y分析结果都不是常量,尽管我们能够一眼看出他们的运行时值是n=10,y=43。 + + + + + +# Call Graph Construction (CHA) + +接下来我们讨论一个必要的数据结构Call Graph,中文可以理解为调用关系图。 + +## Definition of Call Graph + +> A representation of calling relationships in the program. + +调用关系图表达调用关系(中文讲起来很奇怪!!),一个简单的例子如下: + + + +## Call Graph Construction + +Call Graph有很多种不同的构造方法,我们接下来会讲解两个极端:最准确的和最快速的。 + + + +### Call types in Java + +本课主要关注Java的调用关系图构建。为此,我们需要先了Java中调用的类型。Java中call可分为三类(不需要理解透彻,之后会详细介绍): + + + +- 指令:指Java的**IR中的指令** +- 接收对象:Static方法不需要依赖实例 +- 对象方法:表达**方法到IR指令的映射关系** +- 方法的对象:Virtual call与动态绑定和多态实现有关,可以对应多个对象,只能在动态时决定调用哪一个具体方法的实现。所以**Virtual call的可能对象可能超过1个**。 + +### Virtual call and dispatch + +接下来重点讨论Virtual call: + +在动态运行时,Virtual call基于两点决定调用哪个具体方法: + +1. Type of object + +2. Method signature + + - Signature = class type + method name + descriptor + - Descriptor = return type + parameter types + + + + + +Java中Dispatch机制决定具体调用哪个方法:c是一个类的定义,m是一个方法。如果能在本类中找到name和descriptor一致的方法,则调用c的方法,否则到父类中寻找。 + +> We define function Dispatch(𝑐, 𝑚) to simulate the procedure of run-time method dispatch. + + + + + +**练习问题** + +Q:两次对foo的调用分别调用了哪个类的foo? + + + +A:分别调用A和C中定义的foo方法。 + + + +# Class Hierarchy Analysis (CHA) + +## Definition of CHA + +- Require the class **hierarchy information (inheritance structure)** of the whole program + - 需要首先获得整个程序的继承关系图 +- Resolve a virtual call based on the declared type of receiver + variable of the call site + - 通过接收变量的声明类型来解析Virtual call + - 接收变量的例子:在`a.foo()`中,a就是接收变量 +- Assume the receiver variable a may point to objects of class A + or all subclasses of A(Resolve target methods by looking up the class hierarchy of class A) + - 假设一个接收变量能够指向A或A的所有子类 + +## Call Resolution of CHA + +### Algorithm of Resolve + +下面介绍解析调用的算法。 + + + +- call site(cs)就是调用语句,m(method)就是对应的函数签名。 +- T集合中保存找到的结果 +- 三个if分支分别对应之前提到的Java中的三种call类型 + 1. Static call(所有的静态方法调用) + 2. Special call(使用super关键字的调用,构造函数调用和Private instance method) + 3. Virtual call(其他所有调用) + +**Static call** + +- 对于不了解OOP中静态方法的同学可以参考[这里](https://www.geeksforgeeks.org/static-methods-vs-instance-methods-java/)。具体来说,静态方法前写的是类名,而非静态方法前写的是变量或指针名。静态方法不需要依赖实例。 + +**Special call** + +- Superclass instance method(super关键字)最为复杂,故优先考虑这种情况 + + - 为什么需要Dispatch函数?考虑这种情况: + + +- 而Private instance method和Constructor(一定由类实现或有默认的构造函数)都会在本类的实现中给出,使用Dispatch函数能够将这三种情况都包含,简化代码。 + +**Virtual call** + +- receiver variable在例子中就是a。 + +- 对receiver c和c的所有直接间接子类都作为call site调用Dispatch + +**一个例子** + +## CHA的特征 + +1. 只考虑继承结构,所以**很快** +2. 因为忽略了数据流和控制流的信息,所以**不太准确** + +## CHA的应用 + +常用于IDE中,给用户提供提示。比如写一小段测试代码,看看b.foo()可能会调用哪些函数签名: + + + +## Call Graph Construction + +### Idea + +- Build call graph for whole program via CHA + - 通过CHA构造整个程序的call graph +- Start from entry methods (focus on main method) + - 通常从main函数开始 +- For each reachable method 𝑚, resolve target methods for each call site 𝑐𝑠 in 𝑚 via CHA (Resolve(𝑐𝑠)) + - 递归地处理对每个可达的方法 +- Repeat until no new method is discovered + - 当不能拓展新的可达方法时停止 +- 整个过程和计算机领域中求闭包的过程很相似 + + + +### Algorithm + + + +- Worklist记录需要处理的methods +- Call graph是需要构建的目标,是call edges的集合 +- Reachable method是已经处理过的目标,在Worklist中取新目标时,不需要再次处理已经在RM中的目标 + +### Example + +*我也不想当无情的PPT摘抄机器,可是markdown对自己做图的支持太差了啊(x。* + +1. 初始化 +2. 处理main后向WL中加入A.foo() +3. 中间省略一些步骤,这里面对C.bar()时,虽然会调用A.foo(),但由于A.foo()之前已经处理过(在集合RM中),之后不会再进行处理 +4. 这里C.m()是不可达的死代码 + +*注:忽略new A()对构造函数的调用,这不是例子的重点。* + +**这个例子是对本小节的总结,如果不能读懂并独立推导建议重读一遍。** + +## Interprocedural Control-Flow Graph + +> ICFG = CFG+call & return edges + +ICFG可以通过CFG加上两种边构造得到。 + +1. Call edges: from call sites to the entry nodes of their callees +2. Return edges: from return statements of the callees to the statements following their call sites (i.e., return sites) + +例如: + + + + + +# Interprocedural Data-Flow Analysis + +## 定义与比较 + +目前这一分析领域没有标准方法。首先对过程间和过程内的分析做一个对比,并以常量传播为例子进行解释。 + + + +Edge transfer处理引入的call & return edge。为此,我们需要**在之前章节的CFG基础上增加三种transfer函数。** + +- Call edge transfer + - transfer data flow from call node to the + entry node of callee (along call edges) + - 传递参数 +- Return edge transfer + - transfer data flow from return node of + the callee to the return site (along return edges) + - 传递返回值 +- Node transfer + - Same as intra-procedural constant propagation, + plus: for each call node, kill data-flow value for the LHS(Left hand side) variable. Its value will flow to return site along the return edges + + + +## Example + + + + + +### 小问题 + +这一段有存在的必要吗? + + + +> Such edge (from call site to return site) is named call-to-return edge. It allows the analysis to propagate local data-flow (a=6 in this case) on ICFG. + +如果没有这一段,那么a就得“出国”去浪费地球资源——在分析被调用函数的全程中都需要记住a的值,这在程序运行时会浪费大量内存。 + + + +要记得在调用语句处kill掉表达式左边的值,否则会造成结果的不准确,如: + + + + + +# 过程间分析有多重要? + +讲到这里,我们回到故事的开头,看看过程间分析的引入到底能带来多大的精度提高吧。上述例子应用过程间分析的完整推导如下: + + + +而如果只做过程内分析,则**精度大大下降**: + + + +# Sum up + +1. How to build call graph via class hierarchy analysis + 1. 如何构建CHA的call graph +2. Concept of interprocedural control-flow graph + 1. 过程间CFG的概念 +3. Concept of interprocedural data-flow analysis + 1. 过程间数据流分析的概念 +4. Interprocedural constant propagation + 1. 例子。引入过程间分析的常量分析 \ No newline at end of file diff --git a/ch4/README.md b/ch4/README.md new file mode 100644 index 0000000..d1def00 --- /dev/null +++ b/ch4/README.md @@ -0,0 +1,5 @@ +# 过程间分析简介 + +即将抵达没有视频辅助的区域啦,如果想看看视频,请到[第七课](https://www.bilibili.com/video/BV1GQ4y1T7zm)和[第八课](https://www.bilibili.com/video/BV1gg4y1z78p)。 + +写成gitbook形式力求能够更快地掌握要点(虽然还是因为偷懒留了一部分英文没有翻译),而视频则能动态地呈现内容。欢迎大家提出意见和建议!(方式见[Github repo](https://github.com/RangerNJU/Static-Program-Analysis-Book)) \ No newline at end of file