diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..4cb12d8 --- /dev/null +++ b/.gitignore @@ -0,0 +1,16 @@ +# Node rules: +## Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) +.grunt + +## Dependency directory +## Commenting this out is preferred by some people, see +## https://docs.npmjs.com/misc/faq#should-i-check-my-node_modules-folder-into-git +node_modules + +# Book build output +_book + +# eBook build output +*.epub +*.mobi +*.pdf diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..e5ac700 --- /dev/null +++ b/LICENSE @@ -0,0 +1,427 @@ +Attribution-ShareAlike 4.0 International + +======================================================================= + +Creative Commons Corporation ("Creative Commons") is not a law firm and +does not provide legal services or legal advice. Distribution of +Creative Commons public licenses does not create a lawyer-client or +other relationship. Creative Commons makes its licenses and related +information available on an "as-is" basis. Creative Commons gives no +warranties regarding its licenses, any material licensed under their +terms and conditions, or any related information. Creative Commons +disclaims all liability for damages resulting from their use to the +fullest extent possible. + +Using Creative Commons Public Licenses + +Creative Commons public licenses provide a standard set of terms and +conditions that creators and other rights holders may use to share +original works of authorship and other material subject to copyright +and certain other rights specified in the public license below. The +following considerations are for informational purposes only, are not +exhaustive, and do not form part of our licenses. + + Considerations for licensors: Our public licenses are + intended for use by those authorized to give the public + permission to use material in ways otherwise restricted by + copyright and certain other rights. Our licenses are + irrevocable. Licensors should read and understand the terms + and conditions of the license they choose before applying it. + Licensors should also secure all rights necessary before + applying our licenses so that the public can reuse the + material as expected. Licensors should clearly mark any + material not subject to the license. This includes other CC- + licensed material, or material used under an exception or + limitation to copyright. More considerations for licensors: + wiki.creativecommons.org/Considerations_for_licensors + + Considerations for the public: By using one of our public + licenses, a licensor grants the public permission to use the + licensed material under specified terms and conditions. If + the licensor's permission is not necessary for any reason--for + example, because of any applicable exception or limitation to + copyright--then that use is not regulated by the license. Our + licenses grant only permissions under copyright and certain + other rights that a licensor has authority to grant. Use of + the licensed material may still be restricted for other + reasons, including because others have copyright or other + rights in the material. A licensor may make special requests, + such as asking that all changes be marked or described. + Although not required by our licenses, you are encouraged to + respect those requests where reasonable. More_considerations + for the public: + wiki.creativecommons.org/Considerations_for_licensees + +======================================================================= + +Creative Commons Attribution-ShareAlike 4.0 International Public +License + +By exercising the Licensed Rights (defined below), You accept and agree +to be bound by the terms and conditions of this Creative Commons +Attribution-ShareAlike 4.0 International Public License ("Public +License"). To the extent this Public License may be interpreted as a +contract, You are granted the Licensed Rights in consideration of Your +acceptance of these terms and conditions, and the Licensor grants You +such rights in consideration of benefits the Licensor receives from +making the Licensed Material available under these terms and +conditions. + + +Section 1 -- Definitions. + + a. Adapted Material means material subject to Copyright and Similar + Rights that is derived from or based upon the Licensed Material + and in which the Licensed Material is translated, altered, + arranged, transformed, or otherwise modified in a manner requiring + permission under the Copyright and Similar Rights held by the + Licensor. For purposes of this Public License, where the Licensed + Material is a musical work, performance, or sound recording, + Adapted Material is always produced where the Licensed Material is + synched in timed relation with a moving image. + + b. Adapter's License means the license You apply to Your Copyright + and Similar Rights in Your contributions to Adapted Material in + accordance with the terms and conditions of this Public License. + + c. BY-SA Compatible License means a license listed at + creativecommons.org/compatiblelicenses, approved by Creative + Commons as essentially the equivalent of this Public License. + + d. Copyright and Similar Rights means copyright and/or similar rights + closely related to copyright including, without limitation, + performance, broadcast, sound recording, and Sui Generis Database + Rights, without regard to how the rights are labeled or + categorized. For purposes of this Public License, the rights + specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + + e. Effective Technological Measures means those measures that, in the + absence of proper authority, may not be circumvented under laws + fulfilling obligations under Article 11 of the WIPO Copyright + Treaty adopted on December 20, 1996, and/or similar international + agreements. + + f. Exceptions and Limitations means fair use, fair dealing, and/or + any other exception or limitation to Copyright and Similar Rights + that applies to Your use of the Licensed Material. + + g. License Elements means the license attributes listed in the name + of a Creative Commons Public License. The License Elements of this + Public License are Attribution and ShareAlike. + + h. Licensed Material means the artistic or literary work, database, + or other material to which the Licensor applied this Public + License. + + i. Licensed Rights means the rights granted to You subject to the + terms and conditions of this Public License, which are limited to + all Copyright and Similar Rights that apply to Your use of the + Licensed Material and that the Licensor has authority to license. + + j. Licensor means the individual(s) or entity(ies) granting rights + under this Public License. + + k. Share means to provide material to the public by any means or + process that requires permission under the Licensed Rights, such + as reproduction, public display, public performance, distribution, + dissemination, communication, or importation, and to make material + available to the public including in ways that members of the + public may access the material from a place and at a time + individually chosen by them. + + l. Sui Generis Database Rights means rights other than copyright + resulting from Directive 96/9/EC of the European Parliament and of + the Council of 11 March 1996 on the legal protection of databases, + as amended and/or succeeded, as well as other essentially + equivalent rights anywhere in the world. + + m. You means the individual or entity exercising the Licensed Rights + under this Public License. Your has a corresponding meaning. + + +Section 2 -- Scope. + + a. License grant. + + 1. Subject to the terms and conditions of this Public License, + the Licensor hereby grants You a worldwide, royalty-free, + non-sublicensable, non-exclusive, irrevocable license to + exercise the Licensed Rights in the Licensed Material to: + + a. reproduce and Share the Licensed Material, in whole or + in part; and + + b. produce, reproduce, and Share Adapted Material. + + 2. Exceptions and Limitations. For the avoidance of doubt, where + Exceptions and Limitations apply to Your use, this Public + License does not apply, and You do not need to comply with + its terms and conditions. + + 3. Term. The term of this Public License is specified in Section + 6(a). + + 4. Media and formats; technical modifications allowed. The + Licensor authorizes You to exercise the Licensed Rights in + all media and formats whether now known or hereafter created, + and to make technical modifications necessary to do so. The + Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications + necessary to exercise the Licensed Rights, including + technical modifications necessary to circumvent Effective + Technological Measures. For purposes of this Public License, + simply making modifications authorized by this Section 2(a) + (4) never produces Adapted Material. + + 5. Downstream recipients. + + a. Offer from the Licensor -- Licensed Material. Every + recipient of the Licensed Material automatically + receives an offer from the Licensor to exercise the + Licensed Rights under the terms and conditions of this + Public License. + + b. Additional offer from the Licensor -- Adapted Material. + Every recipient of Adapted Material from You + automatically receives an offer from the Licensor to + exercise the Licensed Rights in the Adapted Material + under the conditions of the Adapter's License You apply. + + c. No downstream restrictions. You may not offer or impose + any additional or different terms or conditions on, or + apply any Effective Technological Measures to, the + Licensed Material if doing so restricts exercise of the + Licensed Rights by any recipient of the Licensed + Material. + + 6. No endorsement. Nothing in this Public License constitutes or + may be construed as permission to assert or imply that You + are, or that Your use of the Licensed Material is, connected + with, or sponsored, endorsed, or granted official status by, + the Licensor or others designated to receive attribution as + provided in Section 3(a)(1)(A)(i). + + b. Other rights. + + 1. Moral rights, such as the right of integrity, are not + licensed under this Public License, nor are publicity, + privacy, and/or other similar personality rights; however, to + the extent possible, the Licensor waives and/or agrees not to + assert any such rights held by the Licensor to the limited + extent necessary to allow You to exercise the Licensed + Rights, but not otherwise. + + 2. Patent and trademark rights are not licensed under this + Public License. + + 3. To the extent possible, the Licensor waives any right to + collect royalties from You for the exercise of the Licensed + Rights, whether directly or through a collecting society + under any voluntary or waivable statutory or compulsory + licensing scheme. In all other cases the Licensor expressly + reserves any right to collect such royalties. + + +Section 3 -- License Conditions. + +Your exercise of the Licensed Rights is expressly made subject to the +following conditions. + + a. Attribution. + + 1. If You Share the Licensed Material (including in modified + form), You must: + + a. retain the following if it is supplied by the Licensor + with the Licensed Material: + + i. identification of the creator(s) of the Licensed + Material and any others designated to receive + attribution, in any reasonable manner requested by + the Licensor (including by pseudonym if + designated); + + ii. a copyright notice; + + iii. a notice that refers to this Public License; + + iv. a notice that refers to the disclaimer of + warranties; + + v. a URI or hyperlink to the Licensed Material to the + extent reasonably practicable; + + b. indicate if You modified the Licensed Material and + retain an indication of any previous modifications; and + + c. indicate the Licensed Material is licensed under this + Public License, and include the text of, or the URI or + hyperlink to, this Public License. + + 2. You may satisfy the conditions in Section 3(a)(1) in any + reasonable manner based on the medium, means, and context in + which You Share the Licensed Material. For example, it may be + reasonable to satisfy the conditions by providing a URI or + hyperlink to a resource that includes the required + information. + + 3. If requested by the Licensor, You must remove any of the + information required by Section 3(a)(1)(A) to the extent + reasonably practicable. + + b. ShareAlike. + + In addition to the conditions in Section 3(a), if You Share + Adapted Material You produce, the following conditions also apply. + + 1. The Adapter's License You apply must be a Creative Commons + license with the same License Elements, this version or + later, or a BY-SA Compatible License. + + 2. You must include the text of, or the URI or hyperlink to, the + Adapter's License You apply. You may satisfy this condition + in any reasonable manner based on the medium, means, and + context in which You Share Adapted Material. + + 3. You may not offer or impose any additional or different terms + or conditions on, or apply any Effective Technological + Measures to, Adapted Material that restrict exercise of the + rights granted under the Adapter's License You apply. + + +Section 4 -- Sui Generis Database Rights. + +Where the Licensed Rights include Sui Generis Database Rights that +apply to Your use of the Licensed Material: + + a. for the avoidance of doubt, Section 2(a)(1) grants You the right + to extract, reuse, reproduce, and Share all or a substantial + portion of the contents of the database; + + b. if You include all or a substantial portion of the database + contents in a database in which You have Sui Generis Database + Rights, then the database in which You have Sui Generis Database + Rights (but not its individual contents) is Adapted Material, + + including for purposes of Section 3(b); and + c. You must comply with the conditions in Section 3(a) if You Share + all or a substantial portion of the contents of the database. + +For the avoidance of doubt, this Section 4 supplements and does not +replace Your obligations under this Public License where the Licensed +Rights include other Copyright and Similar Rights. + + +Section 5 -- Disclaimer of Warranties and Limitation of Liability. + + a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE + EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS + AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF + ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, + IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, + WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR + PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, + ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT + KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT + ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. + + b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE + TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, + NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, + INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, + COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR + USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR + DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR + IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. + + c. The disclaimer of warranties and limitation of liability provided + above shall be interpreted in a manner that, to the extent + possible, most closely approximates an absolute disclaimer and + waiver of all liability. + + +Section 6 -- Term and Termination. + + a. This Public License applies for the term of the Copyright and + Similar Rights licensed here. However, if You fail to comply with + this Public License, then Your rights under this Public License + terminate automatically. + + b. Where Your right to use the Licensed Material has terminated under + Section 6(a), it reinstates: + + 1. automatically as of the date the violation is cured, provided + it is cured within 30 days of Your discovery of the + violation; or + + 2. upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any + right the Licensor may have to seek remedies for Your violations + of this Public License. + + c. For the avoidance of doubt, the Licensor may also offer the + Licensed Material under separate terms or conditions or stop + distributing the Licensed Material at any time; however, doing so + will not terminate this Public License. + + d. Sections 1, 5, 6, 7, and 8 survive termination of this Public + License. + + +Section 7 -- Other Terms and Conditions. + + a. The Licensor shall not be bound by any additional or different + terms or conditions communicated by You unless expressly agreed. + + b. Any arrangements, understandings, or agreements regarding the + Licensed Material not stated herein are separate from and + independent of the terms and conditions of this Public License. + + +Section 8 -- Interpretation. + + a. For the avoidance of doubt, this Public License does not, and + shall not be interpreted to, reduce, limit, restrict, or impose + conditions on any use of the Licensed Material that could lawfully + be made without permission under this Public License. + + b. To the extent possible, if any provision of this Public License is + deemed unenforceable, it shall be automatically reformed to the + minimum extent necessary to make it enforceable. If the provision + cannot be reformed, it shall be severed from this Public License + without affecting the enforceability of the remaining terms and + conditions. + + c. No term or condition of this Public License will be waived and no + failure to comply consented to unless expressly agreed to by the + Licensor. + + d. Nothing in this Public License constitutes or may be interpreted + as a limitation upon, or waiver of, any privileges and immunities + that apply to the Licensor or You, including from the legal + processes of any jurisdiction or authority. + + +======================================================================= + +Creative Commons is not a party to its public +licenses. Notwithstanding, Creative Commons may elect to apply one of +its public licenses to material it publishes and in those instances +will be considered the “Licensor.” The text of the Creative Commons +public licenses is dedicated to the public domain under the CC0 Public +Domain Dedication. Except for the limited purpose of indicating that +material is shared under a Creative Commons public license or as +otherwise permitted by the Creative Commons policies published at +creativecommons.org/policies, Creative Commons does not authorize the +use of the trademark "Creative Commons" or any other trademark or logo +of Creative Commons without its prior written consent including, +without limitation, in connection with any unauthorized modifications +to any of its public licenses or any other arrangements, +understandings, or agreements concerning use of licensed material. For +the avoidance of doubt, this paragraph does not form part of the +public licenses. + +Creative Commons may be contacted at creativecommons.org. \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..469baf1 --- /dev/null +++ b/README.md @@ -0,0 +1,12 @@ +# Static-Program-Analysis-Book + +[Gitbook在线阅读地址](https://ranger-nju.gitbook.io/static-program-analysis-book/) + +[GitHub项目地址](https://github.com/RangerNJU/Static-Program-Analysis-Book) + +---- + +Getting started with static program analysis. Read this and start writing your first static program analyzer! + +静态程序分析入门。阅读此书并着手编写你的第一个静态程序分析器吧! + diff --git a/SUMMARY.md b/SUMMARY.md new file mode 100644 index 0000000..2870901 --- /dev/null +++ b/SUMMARY.md @@ -0,0 +1,15 @@ +# Table of contents + +* [第零章-写在前面](src/ch0/README.md) + * [静态程序分析是啥玩应啊](src/ch0/00-01-why-SPA.md) + * [为什么是这本书?](src/ch0/00-02-why-this-book.md) + * [来源与版权信息](src/ch0/00-03-sources-and-license.md) +* [第一章-静态程序分析简介](src/ch1/README.md) + * [初见——静态分析是什么?](src/ch1/01-01-whats-spa.md) + * [中间表示——静态分析器的输入](src/ch1/01-02-ir.md) +* [第二章-数据流分析的理论与应用](src/ch2/README.md) + * [数据流分析](src/ch2/02-00-dataflow-analysis.md) + * [到达定值分析](src/ch2/02-01-reaching-def-analysis.md) + * [活跃变量分析](src/ch2/02-02-live-var-analysis.md) + * [可用表达式分析](src/ch2/02-03-available-exp-analysis.md) + diff --git a/src/README.md b/src/README.md new file mode 100644 index 0000000..f409dc2 --- /dev/null +++ b/src/README.md @@ -0,0 +1,11 @@ +# Static-Program-Analysis-Book + +[Gitbook在线阅读地址](https://ranger-nju.gitbook.io/static-program-analysis-book/) + +[GitHub项目地址](https://github.com/RangerNJU/Static-Program-Analysis-Book) + +---- + +Getting started with static program analysis. Read this and start writing your first static program analyzer! + +静态程序分析入门。阅读此书并着手编写你的第一个静态程序分析器吧! diff --git a/src/ch0/00-01-why-SPA.assets/PL.png b/src/ch0/00-01-why-SPA.assets/PL.png new file mode 100644 index 0000000..88cbf64 Binary files /dev/null and b/src/ch0/00-01-why-SPA.assets/PL.png differ diff --git a/src/ch0/00-01-why-SPA.md b/src/ch0/00-01-why-SPA.md new file mode 100644 index 0000000..d73eeed --- /dev/null +++ b/src/ch0/00-01-why-SPA.md @@ -0,0 +1,35 @@ +# 为什么你需要了解静态程序分析 + +## 定位 + +**静态程序分析**是**编程语言**中**应用**层面下的一个细分领域。 + +![](00-01-why-SPA.assets/PL.png) + +(TODO:举一些具体的例子) + +当今编程语言可以主要分为三大类 +- 命令式(C、C++、JAVA) +- 函数式(Scale、Haskell) +- 逻辑式(Prolog、SQL) + +*之后的内容主要关注于针对命令式语言的分析。* + +当今**编程语言**这个分支下,面临这样一条恶龙:`数十年来语言的核心没有变化,但软件的规模和复杂性增长迅速,如何保证程序的可靠性?` + +## 应用 + +静态程序分析即是屠龙的宝刀之一,掌握并应用这一技术,能够: + +1. 提高程序可靠性——Null pointer dereference, memory leak, etc.(空指针引用与内存泄漏等) +2. 提高程序安全性——Private information leak, injection attack, etc.(隐私信息泄漏与注入攻击等) +3. 为编译优化提供基础技术——Dead code elimination, code motion, etc.(死代码消除和代码向循环外移动等) +4. 有助于程序理解——IDE call hierarchy, type indication, etc.(为集成开发环境的功能提供帮助) + +## 市场 + +在学术界,静态程序分析技术几乎可以应用于所有关于程序的研究方向。 + +在工业界,国外的Google,IBM等大企业已经初步建立了自己的静态程序分析团队。国内的华为和阿里等企业也正在寻找静态程序分析方面的人才。 + +(TODO:添加更为详细的例子) \ No newline at end of file diff --git a/src/ch0/00-02-why-this-book.md b/src/ch0/00-02-why-this-book.md new file mode 100644 index 0000000..3cc75b5 --- /dev/null +++ b/src/ch0/00-02-why-this-book.md @@ -0,0 +1,39 @@ +# 为什么是这本书? + +> 你说的静态程序分析似乎有点儿用处,那么哪里可以学到呢? + +## 为什么应该读这本书? + +**1. 当前的中英文社区都缺乏这一领域的入门材料。** + +**2. 本书将带领读者,通过理论和实践的结合了解这一领域。** + +### 中文社区 + +在搜索引擎上搜索相关中文关键词,你会发现结果靠前的答案都是与某南的李老师相关的课程在B站上公开视频的笔记,其中有不少写得很好,**但并非面向一般学习者开发者的教程**。这两者有重要的区别: + +- 笔记:**面向自己**复习使用,只要自己回顾时能迅速pick up当时理解到的重点,就是一份好的笔记。 +- 教程:**面向学习者**使用,一份好教程能让学习者迅速把握领域中的重点,并且为学习者的进一步应用打下基础。 + +### 英文社区 + +在搜索引擎上搜索英文关键词,你应该能搜索到国际上的大牛们的教材式的PDF文件和相关论文,或是开源的静态分析程序。但同样**缺乏教程**。大多数材料要么艰涩难懂要么太过粗浅。根据粗略的访问,我也了解到业界认为静态程序分析技术仍不成熟。 + +### 理论与实践的结合 + +本书将同时涉及理论和实践,这主要是受到了《The Rust Programming Language》的启发。 + +### 本书写作的目标 + +能让大多数有一定编程经历,已经修过本科计算机基础课程的大四及以上学力(不是学历)的同学: + +1. 在阅读本书时能较为轻松地理解理论 +2. 能够自主完成原型实现 +3. 能在阅读过程中接触CS不同领域的小知识 + +## 为什么要写这本书? + +- 最主要的动力还是老师现场授课时我感受到的passion +- 这是一个少有人涉足的领域,写这方面的内容很符合我的性格 +- 我从开源社区中获益颇多,找到了合适的机会也希望能为开源社区(尤其是中文社区)作出自己的贡献 +- 人类发展的历史重要的两部分是传承和发展。领域先锋发展探索,亦需要有人将新的知识传播开来 diff --git a/src/ch0/00-03-sources-and-license.md b/src/ch0/00-03-sources-and-license.md new file mode 100644 index 0000000..d73168a --- /dev/null +++ b/src/ch0/00-03-sources-and-license.md @@ -0,0 +1,13 @@ +# 来源与版权信息 + +## 资料来源 + +本入门教程基于南京大学《软件分析》课程。 + +[PASCAL研究组主页](https://pascal-group.bitbucket.io/teaching.html) + +## 版权信息 + +教程文字部分遵循CC BY-NC-SA许可。 + +图片部分若无特殊说明,均为课程Slides的一部分,本书中使用这些图片已获Slides作者同意。 diff --git a/src/ch0/README.md b/src/ch0/README.md new file mode 100644 index 0000000..8d694ad --- /dev/null +++ b/src/ch0/README.md @@ -0,0 +1,3 @@ +# 写在前面 + +记录一些你在决定认真阅读本书之前需要了解的信息。 diff --git a/src/ch1/01-01-whats-spa.md b/src/ch1/01-01-whats-spa.md new file mode 100644 index 0000000..6664dc5 --- /dev/null +++ b/src/ch1/01-01-whats-spa.md @@ -0,0 +1,107 @@ +# 静态程序分析是什么 + +> 静态程序分析是指**不编译**出二进制代码通过测试用例对程序进行测试,仅通过**静态地**分析程序得到程序**不平凡**的性质的过程。 + + +## 静态程序分析的抽象定义与诠释 + +### 一句话定义(TODO:中文翻译) + +> Static analysis analyzes a program P to reason about its behaviors and determines whether it satisfies some properties befo re running P. + +### 一句话诠释(TODO:中文翻译) + +> Ensure (or get close to) soundness, while making good trade-offs between analysis precision and analysis speed. + +在分析精度和速度之间做平衡的同时,保证(或近似)soundness。 + +## 静态程序分析的具体解释 + +静态程序分析技术中最重要的两个技术,分别是Abstraction(抽象)和Over-Approximation(过近似) + +### Abstraction + +**抽象是将值从Concrete Domain(具体域)映射到Abstract Domain(抽象域)的过程。** + +举个例子:(TODO:加图) + +Concrete Domain中,变量的值可以是具体的值,也可能是某种表达式或函数的返回值。 + +Abstract Domain中,变量的值分为五类: + +- 正 +- 负 +- 零 +- unknown(未知):根据表达式或函数返回决定,程序运行时会有具体的正负零数值,但运行前只通过该表达式无法确定。如应用C语言中的三目运算符(TODO:加Link)`x = flag ? 1 : -1`中x的值就是unknown的。 +- undefined(未定):程序运行时会遇到错误,并产生未定义行为(TODO:加Link)。如许多语言中divided by zero(除数为零)通常会触发异常/硬件错误,此时如`a=b/0`中a的值就是undefined的。 + +其中unknown通常表达为正的T,读作top;undefined通常表达为上下颠倒的T,读作bottom。 + +### Over-approximation + +**过近似主要是指对抽象值进行操作时的思想。** + +继续上面的例子,具体来说操作可以分为两类:Data flow(数据流)和Control flow(控制流): + +#### Data flow + +(TODO-加图) + +- 两个正数相加为正数 +- ... +- 正数和负数相加,结果为unknown/top。**这是因为运行前只知道表达式的正负号,无法确定结果是正负零中的任何一个,但程序运行时会有具体的数值。** +- unknown/top除以0,结果为undefined。**这是因为会触发未定义行为。** + +#### Control flow + +在程序执行过程中往往会有分支结构,如果x的值在两个分支中分别被赋抽象正值和抽象负值,那么合并时x抽象值应该是五类中的哪一种呢? + +对于分支合并时的x,**运行前只知道抽象值的正负号,无法确定结果是哪一种,但程序运行时会有具体的数值**。无论是抽象为正还是负都无法准确描述x合并后的状态,所以x合并后的抽象值是unknown/top。 + + +## Rice‘s Theorem与静态程序分析目标 + +### Rice's Theorem + +> Any non-trivial property of the behavior of programs in a r.e. language is undecidable. +(TODO:加中文) + +(TODO:加中文) +这句话中有很多很难的词汇,接下来一一解释: +- non-trivial properties ~= interesting properties ~= the properties related with run-time behaviors of programs +- r.e. (recursively enumerable) 递归可枚举语⾔: recognizable by a Turing-machine +- There is no such approach to determine whether P satisfies such non-trivial properties, i.e., giving exact answer: Yes or No +- 故不存在 perfect (sound & complete) static analysis + +### 两个重要概念:Sound与Complete + +> 思考: 作为一个开发者,你使用静态分析器分析自己的代码时,哪种情况更让你觉得糟糕? +> +> - 静态分析器**没有分析出代码错误**的部分。 +> - 静态分析器**判断代码正确的部分为错**的部分。 + +Over- and under-approximations are both for safety of analysis. + +sound: 报出所有问题 may analysis: outputs information that may be true (over-approximation) (safe=over) + +complete: 报出的问题都是对的 must analysis: outputs information that must be true (under-approximation) (safe=under) + + +### 实际应用中静态分析器的设计目标 + +(TODO:加图) +**实际应用中往往没有办法做到完美。因而需要妥协。** + +需要静态分析器能在可接受的时间内给出精度满足要求的解。为此: + +妥协 soundness (false negatives 可能漏报) + +妥协 completeness (false positives 可能误报) (⼤多数情况的分析,因为 soundness 很重要) + +## 再讲五块钱的? + +### 关于未定义行为(TODO:加Link和解释) + +### 关于程序测试与分析(TODO:加程序分析的LINK和对比) + +### 关于判定问题中经常用到的术语 diff --git a/src/ch1/01-02-ir.md b/src/ch1/01-02-ir.md new file mode 100644 index 0000000..d0f9032 --- /dev/null +++ b/src/ch1/01-02-ir.md @@ -0,0 +1,51 @@ +# 中间表示——静态分析的输入 + +## 从编译器的组件谈起 + +一个典型的编译器分成一下几个部分: +- Scanner:读入源代码,借助正则表达式完成词法分析,输出Tokens或报告错误的词法输入。如`goouojd`并是一个有效的英文单词,就会被报告为错误。 +- Parser:读入Tokens,通过上下文无关文法完成语法分析,构建抽象语法树或报告错误的语法输入。如`Like your hair I`就不符合英语语法,会被报告为错误。 +- Type Checker:读入抽象语法树,以属性文法进行语义分析并检查类型兼容性等,输出Decorated AST或报告错误的语义。如`Apples eat you`在语法上正确,但不可能有苹果会吃人,这是语义上的错误。 +- Translater:读入Decorated AST,通过遍历Decorated AST将树型IR(AST)翻译为线型IR(通常是三地址码的形式)。然后可以**通过静态分析**进行机器无关的代码优化。 +- Code Generator:读入线型IR,并根据指定的CPU指令集将机器无法直接执行的IR转换为机器可直接执行的机器代码。 + +### AST与IR的比较 + +AST: +- high-level and closed to grammar structure +- usually language dependent +- suitable for fast type checking +- lack of control flow information + +即: +- 更接近于语法规定的结构 +- 通常与语言有关 +- 适于快速类型检查 +- 没有控制流信息 + +IR: +- low-level and closed to machine code +- usually language independent +- compact and uniform +- contains control flow info +- usually considered as the basis for static analysis + +即: +- 更接近于机器码 +- 通常与语言无关 +- 紧凑而通用 +- 包含控制流信息 +- 通常被视为静态分析的基础 + +**总的来说,线型IR更适合静态分析。线型IR没有固定的定义,实际使用中常用三地址码。** + +~~在计算机领域,直接用英文常常更容易将概念表达清楚,如果有读者认为有更好的翻译,可以联系作者(邮箱或github issue都可以)~~ + +### 三地址码 + +> Definition: Each 3AC contains at most 3 addresses (name, constant, temporary) + +定义:每一条三地址码最多包含3个地址(地址包括程序员显式定义的有名字变量,常量和临时变量) +一些形式 + +TBD \ No newline at end of file diff --git a/src/ch1/README.md b/src/ch1/README.md new file mode 100644 index 0000000..fa58dfb --- /dev/null +++ b/src/ch1/README.md @@ -0,0 +1,4 @@ +# 入门指南 + +在这一部分中,将介绍什么是静态程序分析(下文将简称为静态分析),这一技术有什么样的应用,为什么它值得我们去学习与研究。 + diff --git a/src/ch2/02-00-dataflow-analysis.md b/src/ch2/02-00-dataflow-analysis.md new file mode 100644 index 0000000..4d9e5c9 --- /dev/null +++ b/src/ch2/02-00-dataflow-analysis.md @@ -0,0 +1,4 @@ +# 数据流分析 + +TBD + diff --git a/src/ch2/02-01-reaching-def-analysis.md b/src/ch2/02-01-reaching-def-analysis.md new file mode 100644 index 0000000..b430fdb --- /dev/null +++ b/src/ch2/02-01-reaching-def-analysis.md @@ -0,0 +1,4 @@ +# 数据流分析之到达定值分析 + +TBD + diff --git a/src/ch2/02-02-live-var-analysis.md b/src/ch2/02-02-live-var-analysis.md new file mode 100644 index 0000000..a733705 --- /dev/null +++ b/src/ch2/02-02-live-var-analysis.md @@ -0,0 +1,4 @@ +# 数据流分析之活跃变量分析 + +TBD + diff --git a/src/ch2/02-03-available-exp-analysis.md b/src/ch2/02-03-available-exp-analysis.md new file mode 100644 index 0000000..12dd58f --- /dev/null +++ b/src/ch2/02-03-available-exp-analysis.md @@ -0,0 +1,18 @@ +# 数据流分析之可用表达式分析 + +## 定义 + +> An expression x op y is available at program point p if (1) all paths from the entry to p must pass through the evaluation of x op y, and (2) after the last evaluation of x op y, there is no redefinition of x or y. + +## 理解定义:一个tricky的例子 + +1. 按照直觉来说,c处不是,但是按照定义来说是。但是实际上运行时只会走一条分支。 + +2. 回顾: +must->under-approximation->报出来的全都是真的。不允许误报,但允许漏报 +may->不允许漏报,但允许误报 +重新理解这个例子。 + +PS. 如何记住边界条件设置谁?设置是空还是满?对余下所有块的初始化是空还是满(一个巧妙的方法是判断may/must后,如果条件汇合时是交集,一般开始时大多数块初始化为满。) + +一个经常迷惑的问题:先gen先kill?根据定义就解决所有问题2 diff --git a/src/ch2/README.md b/src/ch2/README.md new file mode 100644 index 0000000..152d806 --- /dev/null +++ b/src/ch2/README.md @@ -0,0 +1 @@ +# 数据流分析的理论与应用 diff --git a/src/ch3/03-00-dataflow-analysis.md b/src/ch3/03-00-dataflow-analysis.md new file mode 100644 index 0000000..70d2b46 --- /dev/null +++ b/src/ch3/03-00-dataflow-analysis.md @@ -0,0 +1,4 @@ +# 数据流分析例一——Reaching Definition + +TBD + diff --git a/src/ch3/03-03-available-exp-analysis.md b/src/ch3/03-03-available-exp-analysis.md new file mode 100644 index 0000000..52d0cd0 --- /dev/null +++ b/src/ch3/03-03-available-exp-analysis.md @@ -0,0 +1,18 @@ +# 可用表达式分析 + +## 定义 + +> An expression x op y is available at program point p if (1) all paths from the entry to p must pass through the evaluation of x op y, and (2) after the last evaluation of x op y, there is no redefinition of x or y. + +## 理解定义:一个tricky的例子 + +1. 按照直觉来说,c处不是,但是按照定义来说是。但是实际上运行时只会走一条分支。 + +2. 回顾: +must->under-approximation->报出来的全都是真的。不允许误报,但允许漏报 +may->不允许漏报,但允许误报 +重新理解这个例子。 + +PS. 如何记住边界条件设置谁?设置是空还是满?对余下所有块的初始化是空还是满(一个巧妙的方法是判断may/must后,如果条件汇合时是交集,一般开始时大多数块初始化为满。) + +一个经常迷惑的问题:先gen先kill?根据定义就解决所有问题2 \ No newline at end of file diff --git a/src/ch3/README.md b/src/ch3/README.md new file mode 100644 index 0000000..152d806 --- /dev/null +++ b/src/ch3/README.md @@ -0,0 +1 @@ +# 数据流分析的理论与应用