Skip to content

Commit

Permalink
Update Lec 15/16 on IFDS & Soundiness.
Browse files Browse the repository at this point in the history
  • Loading branch information
RangerNJU committed Dec 31, 2020
1 parent 63481a9 commit ed39031
Show file tree
Hide file tree
Showing 94 changed files with 475 additions and 46 deletions.
Binary file added README.assets/image-20201231205814145.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,13 @@ Getting started with static program analysis. Read this and start writing your f

1. Oct, 2020. 设立Repo,一个月内解锁Star、Fork和PR。
2. Nov. 将IR与Data Flow Analysis的相关内容暂时移出仓库,更新七至十课——Interprocedural Analysis、Pointer Analysis-Introduction and Foundations。
3. Dec. 更新十一和十二两课——Context Sensitive Pointer Analysis,指针分析大结局。🥳
3. Dec. 更新十一和十二两课——Context Sensitive Pointer Analysis,指针分析大结局。🥳 更新十三十四课介绍指针分析的安全应用Taint Analysis和使用Datalog实现声明式指针分析算法。 更新十五十六课介绍IFDS分析框架和Soundiness。

图文的主体部分更新完毕,撒花~

后续会考虑录制一些视频补充对动态例子的讲解。

![感谢读者支持 :P](README.assets/image-20201231205814145.png)

## 这一《静态程序分析》教程对谁有用?

Expand Down
13 changes: 8 additions & 5 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## 前言 <a id="ch0"></a>

* [写在前面](ch0/README.md)
* [写在前面](ch0/ch0.md)
* [为什么是这本书?](ch0/00-01-why-this-book.md)
* [资料来源与版权信息](ch0/00-02-sources-and-license.md)

Expand All @@ -26,10 +26,13 @@
* [上下文敏感分析(上)](ch3/context-sensitivity/03-04-context-sensitivity.md)
* [上下文敏感分析(下)](ch3/context-sensitivity/03-05-cs2.md)

## 指针分析应用 <a id="ch4"></a>
## 指针分析应用与声明式实现 <a id="ch4"></a>

* [安全应用——污点分析](ch4/04-01-security.md)
* [应用——污点分析](ch4/04-01-security.md)
* [实现——声明式指针分析](ch4/04-02-Datalog-Based-PA.md)

## 指针分析的另一种样子 <a id="ch5"></a>
## 其他话题 <a id="ch5"></a>

* [另一种静态分析框架——IFDS](ch5/05-01-IFDS.md)
* [从Soundness到Soundiness](ch5/05-02-Soundiness.md)

* [声明式指针分析算法](ch5/05-01-Datalog-Based-PA.md)
File renamed without changes.
2 changes: 1 addition & 1 deletion ch1/ch1.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@
* [第五课-数据流分析理论一](https://www.bilibili.com/video/BV1A741117it)
* [第六课-数据流分析理论二](https://www.bilibili.com/video/BV1964y1M7nL)

[这里](https://ranger-nju.gitbook.io/static-program-analysis-book/ch4)直接跳转到施工完毕区域。
[这里](https://ranger-nju.gitbook.io/static-program-analysis-book/ch2)直接跳转到施工完毕区域。

4 changes: 2 additions & 2 deletions ch3/context-sensitivity/03-05-cs2.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,11 @@
在这一部分,我们只需要理解Select的作用(对于Select的具体实现,会在后面讲解):

* ProcessCall接收两个参数,意义是:带有上下文标记的x新增一个带有上下文标记指向目标o。
* Select接收参数(这里虽然有4个参数,但并非每种实现方式都需要用到所有的4个参数)
* m代表目标方法。
* Select接收参数(这里虽然有3个参数,但并非每种实现方式都需要用到所有的3个参数)
* c。x的上下文标记
* l。调用点本身(call site),在例子中以行号标识调用点
* $$c' : o_i $$。receiver object
* m。 目标method
* Select返回callee的context $$ c^t$$

## Context Sensitivity Variants
Expand Down
2 changes: 1 addition & 1 deletion ch4/04-01-security.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ More on Integrity-a Board Definition(在信息流安全以外的语境中,Inte

More:

- 通过观察电量消耗、网络流量特征、缓存命中率、服务器相应时长特征,都能以某种方式获得某种程度的机密信息。
- 通过观察电量消耗、网络流量特征、缓存命中率、服务器响应时长特征,都能以某种方式获得某种程度的机密信息。

- Side Channel: "AF缺乏淡水"

Expand Down
64 changes: 32 additions & 32 deletions ch5/05-01-Datalog-Based-PA.md → ch4/04-02-Datalog-Based-PA.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ Datalog是一种命令式(Declarative)的编程语言。

如果用Imperative的编程方式做指针分析,很麻烦。

<img src="05-01-Datalog-Based-PA.assets/image-20201223184349163.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223184349163.png" style="zoom:50%;" />

而如果用Declarative的方式做编程分析,能够极大地简化实现。

<img src="05-01-Datalog-Based-PA.assets/image-20201223184415502.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223184415502.png" style="zoom:50%;" />

# Introduction to Datalog

Expand All @@ -36,18 +36,18 @@ Datalog是一种命令式(Declarative)的编程语言。

谓词(Predicates)是datalog中的一个主要组成部分,可以看作是数据所组成的一个表(table of data),每一行都代表一个事实(fact)。例如:

<img src="05-01-Datalog-Based-PA.assets/image-20201223185015690.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223185015690.png" style="zoom:50%;" />

### Atoms

原子(Atoms)是Datalog中的基本元素,组成和例子如下:

<img src="05-01-Datalog-Based-PA.assets/image-20201223185231533.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223185231533.png" style="zoom:50%;" />

Atoms可以分成两类

- Relational Atoms
- <img src="05-01-Datalog-Based-PA.assets/image-20201223185504296.png" style="zoom:50%;" />
- <img src="04-02-Datalog-Based-PA.assets/image-20201223185504296.png" style="zoom:50%;" />
- Arithmetic Atoms
-`age >= 18`

Expand All @@ -57,17 +57,17 @@ Atoms可以分成两类

Datalog使用规则来进行推导(inference),其定义如下:

<img src="05-01-Datalog-Based-PA.assets/image-20201223185750701.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223185750701.png" style="zoom:50%;" />

当Body中的所有表达式都为True时,Head才为True,如:

<img src="05-01-Datalog-Based-PA.assets/image-20201223185957740.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223185957740.png" style="zoom:50%;" />

求解过程(Interpretation of Datalog Rules)——枚举Body中所有关系表达式的可能取值组合,进而得到新的predicate/table。例如:

<img src="05-01-Datalog-Based-PA.assets/image-20201223190539380.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223190539380.png" style="zoom:50%;" />

<img src="05-01-Datalog-Based-PA.assets/image-20201223190710495.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223190710495.png" style="zoom:50%;" />

谓词分为两类:EDB & IDB。

Expand All @@ -78,23 +78,23 @@ Datalog使用规则来进行推导(inference),其定义如下:

例如:

<img src="05-01-Datalog-Based-PA.assets/image-20201223190916420.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223190916420.png" style="zoom:50%;" />

### Logic Or

以上例子实际上是逻辑与,而逻辑或则有两种实现方式:

<img src="05-01-Datalog-Based-PA.assets/image-20201223191303096.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223191303096.png" style="zoom:50%;" />

此外还需要考虑运算优先级的问题,建议在书写程序时用括号明确地标识期望的运算优先级:`H<-A,(B;C)`

### Logic Not/Negation

<img src="05-01-Datalog-Based-PA.assets/image-20201223191451507.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223191451507.png" style="zoom:50%;" />

### Recursion

<img src="05-01-Datalog-Based-PA.assets/image-20201223191716873.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223191716873.png" style="zoom:50%;" />

## Rule Satety

Expand All @@ -103,7 +103,7 @@ Datalog使用规则来进行推导(inference),其定义如下:
- `A(x) <- B(y), x > y.`
- `A(x) <- B(y), !C(x,y).`

<img src="05-01-Datalog-Based-PA.assets/image-20201223192111827.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223192111827.png" style="zoom:50%;" />

~~第一次学看不出问题也没问题的~~

Expand All @@ -119,13 +119,13 @@ Datalog使用规则来进行推导(inference),其定义如下:

`A(x) <- B(x), !A(x)`

<img src="05-01-Datalog-Based-PA.assets/image-20201223193351919.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223193351919.png" style="zoom:50%;" />

对应地有第二个准则:**不要把recursion和negation写在同一条规则里***即避免写出非A推导出A这样的规则*

## Execution of Datalog Program

<img src="05-01-Datalog-Based-PA.assets/image-20201223193808413.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223193808413.png" style="zoom:50%;" />

Datalog的两大重要特性:

Expand All @@ -142,25 +142,25 @@ Datalog的两大重要特性:
- IDB:指针分析的结果
- Rules:指针分析的规则

<img src="05-01-Datalog-Based-PA.assets/image-20201223194245876.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223194245876.png" style="zoom:50%;" />

和之前一样,我们把Call放到最后处理。

## Datalog Model-EDB&IDB

我们首先需要对前四条语句建模。输入的EDB代表了4个存储相应类型语句的table,输出为Variable和Field的指向关系。

<img src="05-01-Datalog-Based-PA.assets/image-20201223194326258.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223194326258.png" style="zoom:50%;" />

一个关于EDB的例子如下:

<img src="05-01-Datalog-Based-PA.assets/image-20201223194559347.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223194559347.png" style="zoom:50%;" />

## Datalog Rules

Body按照红线所示代表前提,Head按照蓝线所示代表结论。

<img src="05-01-Datalog-Based-PA.assets/image-20201223195004344.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223195004344.png" style="zoom:50%;" />

## Example

Expand All @@ -178,71 +178,71 @@ e = d.f;

结果如下:

<img src="05-01-Datalog-Based-PA.assets/image-20201223195501180.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223195501180.png" style="zoom:50%;" />

## PA with Calls

回顾指针分析中Call的规则:

<img src="05-01-Datalog-Based-PA.assets/image-20201223200506410.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223200506410.png" style="zoom:50%;" />

### This

首先,我们需要引入新的EDB和IDB:

(VCall即Virtual Call,ThisVar即This Variable)

<img src="05-01-Datalog-Based-PA.assets/image-20201223200243589.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223200243589.png" style="zoom:50%;" />

我们能够同时知道三个信息:

1. this指向对象o
2. 方法m是可达的
3. 方法m可达是因为在l行处存在对方法m的调用

<img src="05-01-Datalog-Based-PA.assets/image-20201223200617134.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223200617134.png" style="zoom:50%;" />

### Parameters

接下来要处理参数的传递,与之前类似,引入EDB标识Argument(调用语句行号,参数标号和参数本身)和Parameter(被调用方法,参数标号和参数本身):

<img src="05-01-Datalog-Based-PA.assets/image-20201223200945544.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223200945544.png" style="zoom:50%;" />

而对应用Datalog书写的规则如果用自然语言描述,就是处理行号l处对m的调用时,根据形参和实参的信息,将实参已经有的指向关系传递给形参数。

<img src="05-01-Datalog-Based-PA.assets/image-20201223201109543.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223201109543.png" style="zoom:50%;" />

### Return Value

引入EDB:

<img src="05-01-Datalog-Based-PA.assets/image-20201223201354991.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223201354991.png" style="zoom:50%;" />

对应的Datalog Rule,处理返回值的指向关系:

<img src="05-01-Datalog-Based-PA.assets/image-20201223201428109.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223201428109.png" style="zoom:50%;" />

### Sum up

以上三个部分总结起来,就能得到以下的Datalog Rules。

<img src="05-01-Datalog-Based-PA.assets/image-20201223201612811.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223201612811.png" style="zoom:50%;" />

进而能得到全程序的分析算法如下。值得一提的是,在VarPointsTo规则中,添加Reachable(m)跳过不可达方法中的对象。而其他规则不需要加这一条件,则是因为它们都有VarPointsTo规则作为Body的一部分。

<img src="05-01-Datalog-Based-PA.assets/image-20201223201746117.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223201746117.png" style="zoom:50%;" />

# Taint Analysis via Datalog

## Datalog Model

同样需要用户提供Source和Sink。输出被标记的数据可能流到的Sink方法。<img src="05-01-Datalog-Based-PA.assets/image-20201223201956852.png" style="zoom:50%;" />
同样需要用户提供Source和Sink。输出被标记的数据可能流到的Sink方法。<img src="04-02-Datalog-Based-PA.assets/image-20201223201956852.png" style="zoom:50%;" />

## Datalog Rules

说明:这里参数列表中用``表示通配符,即不关心枚举时这个位置取什么值。

<img src="05-01-Datalog-Based-PA.assets/image-20201223202140289.png" style="zoom:50%;" />
<img src="04-02-Datalog-Based-PA.assets/image-20201223202140289.png" style="zoom:50%;" />

# Key Points

Expand Down
7 changes: 5 additions & 2 deletions ch4/ch4.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# 指针分析应用
# 指针分析应用、实现与其他

接下来说说基于指针分析的污点分析(Taint Analysis)。
接下来说说基于指针分析的静态分析相关技术知识:

- 安全应用方向的污点分析(Taint Analysis)。
- 如何用声明式的语言Datalog实现指针分析。
Binary file added ch5/05-01-IFDS.assets/image-20201231190743854.png
Binary file added ch5/05-01-IFDS.assets/image-20201231191027738.png
Binary file added ch5/05-01-IFDS.assets/image-20201231191043066.png
Binary file added ch5/05-01-IFDS.assets/image-20201231192128228.png
Binary file added ch5/05-01-IFDS.assets/image-20201231193404292.png
Loading

0 comments on commit ed39031

Please sign in to comment.