-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for: A good paper on how TLS-in-TLS detection works? #281
Comments
No. 几个月前有一篇研究 TLS in TLS 握手特征的论文(至今尚未发布),它的作者找到了我和 @yuhan6665 ,我提了一些修改建议 这篇论文也让我们意识到了实现 Vision Seed 的紧迫性,正好近期我们就在开发它,我还会给 Trojan-killer 加上原理说明等 A few months ago, the author of a paper on TLS in TLS handshake characterization (which has not yet been published) reached out to me and @yuhan6665, and I suggested some changes. This paper also made us realize the urgency of implementing Vision Seed, which we've been working on for a while now, and I'll be adding a rationale to Trojan-killer, etc. |
If I understand correctly, The paper actually states that protocols like XTLS Vision that simply add padding to the connection header don't work in their own homemade TLS in crypto recognition model, and there is currently no evidence that GFW applies such checks either. In other words, not only is the protocol complex and undefined, it is also meaningless for anti-censorship purposes. Your Trojan-killer simply checks the first packet length and claims to have no false positives on your own device. For the doubter, you even said that you don't know how to open a .pcap file. Using Chinese to promote your project in the anti-censorship community and exaggerating the dangers of TLS in crypto traffic characteristics actually does not help understanding GFW. |
首先我认为,你在看了别人未发表的文章后,公开说这篇文章是什么内容是不合适的。你的描述仅是这篇文章内容的一部分且并不准确,但我若说太多就是揭示了更多的内容,包括他的测试方法、其它协议的数据以及数据之间的对比。我最多能透露,根据作者的说法,通用 TLS in TLS 握手检测模型对 Vision 的强 padding 效果非常差,只能“协议倒模”,而这就是 Seed 要解决的问题之一。 然后“either”这个词,我补充一下背景,这位朋友说 TLS in TLS 检测是炒作,还有一个人说是骗流量,所以我就写了 Trojan-killer,仅为了揭示确实存在的问题。你我都清楚 Vision 和 Trojan 的区别,也清楚 Trojan 的反馈经常是一天封一个端口,不知道这种大规模的用户实测能否作为你要的“evidence”?此外不要忘记,就是在这里出现的“内鬼”说的 GFW 已经部署了这种检测,或许他不一定可信,然而我证明了我们自己就能检测,还有大规模的用户实测作为佐证。 First of all I don't think it's appropriate for you to say publicly what this article is about after reading someone else's unpublished article. Your description is only part of the article and is not accurate, but to say too much would be to reveal much more, including his testing methodology, data from other protocols, and comparisons between the data. The most I can reveal is that, according to the author, the generic TLS in TLS handshake detection model is so ineffective against Vision's strong padding that it can only be "protocol inverted", which is one of the problems Seed is trying to solve. And the word "either", let me add a little background, this person said that TLS in TLS detection is hype, and another person said that it's a traffic scam, so I wrote Trojan-killer just to reveal that there is a real problem. You and I both know the difference between Vision and Trojan, and we both know that Trojan's feedback is often one port a day, so I wonder if this large-scale user testing is the "evidence" you're looking for? Also, don't forget that the " insider " who appeared here said that GFW has already deployed this kind of detection, maybe he may not be credible, but I proved that we can detect it by ourselves, and there is a large-scale user testing as a proof.
并不是仅首包,而是“客户端 CCS(包括)后、服务端发数据前,客户端所发的数据量总和”,以及“客户端再次发数据前,服务端所发的数据量总和”,所以其实我的检测限制的是时序条件。并不是“no false positives”,而是约千分之一(Tun 模式)。 至于你所说的“doubter”,他公然告诉我们“如果靠一个简单的size匹配能过滤全部tls流量,那么密码学的基础就不存在了”,这句话的水平我就请各位自行评判,我已经评价过了。那个文件当时我确实等了半天都没能打开,那段时间我的 WireShark 一直很卡,后来重装系统发现应该是之前我为了抓 Chrome 的包,开了导出密钥供 WireShark 解密,它是一直积累的,时间长了就很卡。 It's not just the first packet, it's "the sum of the amount of data sent by the client after the client's CCS (inclusive) and before the server sends the data" and "the sum of the amount of data sent by the server before the client sends the data again", so really what I'm restricting my detection to is the Timing condition. Not "no false positives", but about 1 in 1000 (Tun mode). As for your "doubter", he blatantly told us that "if a simple size match can filter all tls traffic, then the foundation of cryptography doesn't exist", I'll leave it to you to judge the level of this statement, I've already done so. I've already done that. I did wait for half a day to open that file, and during that time my WireShark had been very stuck, and then I reinstalled my system and realized that I had opened the export key for WireShark to decrypt in order to capture Chrome packets, and it had been accumulating, and it was very stuck after a long period of time.
用中文怎么就有特殊效果了?而且我不是一贯用中文吗,这也能找角度黑?我用中文的最终原因很简单,因为我对英语的掌握远未达到随意改变语言特征的程度,毕竟不是母语,而用中文干这件事就方便很多。 How does using Chinese give you a special effect? And don't I always use Chinese, how can I find an angle to hack? The ultimate reason I use Chinese is simply because my mastery of English is far from being able to change the characteristics of the language at will, after all, it's not my mother tongue, and it's much easier to do this in Chinese. |
似乎并不是这样 That doesn't seem to be the case
您应该清楚当时确定的原因是 Go TLS 指纹识别,且有报告有无条件 443 封锁出现。 无论如何,您的方法没有使用机器学习,也似乎无法应对不同的 Trojan 实现,也不会被 GFW 使用;且没有证据证明 GFW 应用了此类机器学习识别。考虑此类模型在误报率与性能问题,要想大规模部署都还需要很长时间。 You should be aware that the identified cause at the time was Go TLS fingerprinting, and there were reports of unconditional 443 blocks occurring. Regardless, your approach does not use machine learning, does not appear to cope with different Trojan implementations, and is not used by GFW; and there is no evidence that GFW applies such machine learning recognition. Considering the false alarm rate and performance issues of such models, it will take a long time to deploy them on a large scale.
我记得这位内鬼还曾爆料 GFW 使用 AES 加密的数据存在的特征进行封锁,这也是您要的密码学不存在了吗? I remember that this insider also broke the news that GFW uses the characteristics of AES encrypted data to block it. Is this also the cryptography you want that does not exist?
您确实正在宣传一个没有证据的威胁,且您的 “识别” 并不能证明 GFW 确实能够或已经应用此类检查。 You are indeed promoting a threat without evidence, and your "proof" does not prove that GFW can or has applied such checks. |
Exactly, a paper demonstrating TLS-in-TLS is what I am looking for (and specifically, a link), or a paper that explains what testing has been done to identify how the GFW does it. It's hard for me to understand not just the urgency for Vision Seed (on which I will "just trust" you) but also how it works under the hood, because I don't understand what exact features it aims to eliminate. and yes, ideally it would be in english. I will also say that it is currently very hard for an outsider to study anything related to the XTLS project. It may be intentional, but hence my original question, I am looking for a technical writeup of:
announcement of XTLS vision addresses 4 and 5, and is really vague about 3. if this is intentional due to strategic reasons it is fine, there's a reason i asked the original question in a research forum though, and not in the XTLS bugtracker |
I can tell you that this paper does not study the behavior of GFW, but constructs a machine learning model by itself. If it weren't for rprx's publicity, I'm afraid the community wouldn't know how dangerous it is. With the exception of the XTLS vision padding, everything about XTLS is about performance optimization and is unreviewed, and older versions of it have identifiable issues. |
This comment was marked as off-topic.
This comment was marked as off-topic.
i think there is merit in proactively eliminating features of a protocol before others get to build detection around it. i did not intend to start a discussion around that or to discuss how the XTLS project is lead. I am here to understand how TLS-in-TLS detection works, and I suppose if the GFW does not do it, how it can be done. again, even after this long unrelated debate, I have not seen a decent writeup on it. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
请你仔细看一下那个表格,检测其它协议(包括一些 mux)所用的是 通用模型,而检测 obfs 和 Vision 所用的是两个 针对性模型,即对 Vision 进行“协议倒模”,这就是 Seed 要解决的问题之一。我向作者说应当把通用模型检测 Vision 的数据放上来,作者对 @yuhan6665 和我说了上面这番话,这就是当时它没被放上来的原因。 If you look at the table carefully, detecting other protocols (including some muxes) uses generic models, while detecting obfs and Vision uses two specific models, i.e., "protocol inverting" for Vision, which was one of the problems Seed was trying to solve. I said to the author that the data from the generic model for detecting Vision should be put up, and the author said the above to @yuhan6665 and me, which is why it wasn't put up at the time.
如果你认为只有指纹识别,今年 uTLS 已经铺开,Trojan 和 Vision 都能用的情况下,仍然经常有人报告前者一天封一个端口。 Trojan-killer 仅为一个揭示 TLS in TLS 握手问题的启发性 PoC,为什么非要基于机器学习的方法?它写出来也不是为了非常完善以便被 GFW 使用。此外,我并没有 GFW 的方法的误报率与性能的内部数据,但我相信检测它对 GFW 的资源来说不是问题。 If you think it's only fingerprinting, uTLS has been rolled out this year with both Trojan and Vision working, and there are still frequent reports of the former blocking a port a day. Trojan-killer is just an illuminating PoC that reveals the TLS in TLS handshake problem, so why does it have to be based on a machine learning approach? It's also not written to be perfected for use by GFW. Also, I don't have internal data on the false positives and performance of GFW's method, but I'm sure detecting it would not be a problem for GFW's resources.
并不是同一个人。对于 AES in AES 的说法,我的评价是“至于 AES in AES,我也觉得有点扯,但他说和硬件有关,不是我的专业。” 请注意,你的发言多次存在这样的事实性错误。 Not the same person. My comment on the AES in AES statement was "As far as AES in AES goes, I think it's a bit of a stretch too, but he said it had to do with hardware, not my specialty." Please note that your statement is factually incorrect in this way on several occasions.
我建议你自己去测试一下。既然我们都能用低成本检测出经典的 TLS in TLS 握手,为什么你仍觉得 GFW 没有能力实施检测? 关于 GFW 是否应用了此类检查,上面已经说过了。这已经是人们切实遇到的事情、实测出的区别。 I suggest you test it yourself. Why do you still think that GFW is not capable of applying inspections when we can all detect the classic TLS in TLS handshake at low cost? The question of whether GFW applies such checks has already been addressed above. It's already a difference between what people actually encounter and what they actually test. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
@mmmray 请不要误会,我的意思是他应当在这里发就事论事的“正面回应”,这只是在讨论技术问题。 Please don't get me wrong, I mean he should be posting "direct responses" here on the matter, this is just a technical discussion. |
I guess it is worth actually explaining TLS-in-TLS detection before the flamewar heats up as people tend to get involved in the war without a proper understanding of the underlying issue. First, let's take a look at TLS. Modern TLS usage is almost provable secure with LHAE security. Which means, vulnerabilities keep cropping up, but for most people that's just fine. An attacker's power is greatly limited compared to plaintext protocols:
These properties may seem strong enough for a generic secure transport, but not necessarily sufficient. It is common to want to hide some information in the handshake messages, hence ESNI and ECH. It is common to want to bind a TLS stream to another stream, where existing solutions have massive problems. Worse, for anti-censorship developers and malware developers, every aspect not covered by that notion of security is troublesome:
TLS-in-TLS detection is just a specialized version of flow analysis because the length and timing of a browser-initiated TLS handshake are too typical. It takes no machine learning to classify: Client Hello is always 517 bytes, Certificate is always huge like about 4KiB, predictable timing like the first Application Data after Change Cipher Spec. People have been classifying applications for so long. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
(continued) To mitigate these problem, there came uTLS, application fronting, padding and a lot more techniques. What is being discussed here is padding.
Who knows (except that inner ghost). No one knows what those heuristics are. Fixing broken heuristics to accommodate ECH is just several hours. Padding has shown to be somewhat effective in hiding inner stream characteristics. Good news: TLS can be arbitrarily shaped. Bad news: TLS can be made really slow with long padding. And effectively hiding the inner TLS stream requires rather long padding. XTLS chose to pad only the first few packets and leave the remainder to be raw, unaltered browser traffic, hence the somewhat opinionated breakage of V2Ray's composability. People diverge about what is the best way to pad records. NaïveProxy uses short, always-on padding with multiplexing. Is long, specialized padding better or is short, indiscriminate padding? And the draft being discussed has not paid attention to the longer stream. Will XTLS allow an attacker to determine what website you're visiting? Likely if you are of high suspect. That said, the padding strategy somewhat subjective and subject (pun intended) to the threat model. Xray has an opinionated threat model yet advertises itself as the objectively better choice. Trojan and Naïve has a very generic threat model, but it can be easily hammered down (with massive collateral damage) in situations like Iran. Is Xray bad by such advertisement? WireGuard is doing the same thing. Xray did well in being the stable go-to solution for many people. Please at least take a look at the threat model before starting a flamewar. You can get away with nearly any protocol with a very clean IP. You can get away with Trojan-over-Trojan. There are one thousand weird ways to build a usable protocol, yet people want a stable protocol. There are even times when XTLS can get you into trouble, but how can you know that without learning as much as to become a developer? It is sheer hard to predict new features to build into a protocol. What if another inner ghost leaks that the GFW is using another way of detection instead of TLS-in-TLS length? Just try to be nice please. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Inside a tunnel, of course, you have freedom to make the communication pattern anything you want it to be. You can specify a scheme for padding and have packet sends operate according to a schedule that is independent of the actual application payload (TLS or whatever it may be). You could even make it look like the tunneled protocol is a "server first" protocol, by having the server send padding at the start, and having the client buffer its initial data until it has received the padding from the server. #9 (comment) has a sketch of such a padding scheme, and "Security Notions for Fully Encrypted Protocols" from this year's FOCI also shows how to achieve arbitrary traffic patterns using buffering. Even tunnel protocols that do not nominally support padding might be able to work in this way. Shadowsocks AEAD does not have explicit padding support, but you can simulate padding by encrypting and tagging zero-length ciphertexts. So a Shadowsocks AEAD tunnel could disguise its contents by first having the server send sufficiently many empty ciphertexts to make, say, 200 bytes; the client would buffer its outgoing data until after receiving bytes from the server, and then both sides could continue normally. Something like this is likely to defeat a simple tunneled TLS detector. Here's an idea for a hack to transforming TLS into a "server first" protocol, even when the tunnel protocol does not support padding. You can have the server immediately transmit a fixed prefix of the Server Hello, say This would be easy to implement as a pair of netcat-like wrapper programs. On the client:
On the server:
|
@wkrp 感谢你的补充。加密隧道内的流量分析是一个已经有很多研究的领域,早在 VLESS BETA 中我也简述了两种最基本的分类:
VLESS Flow 与 Seed 存在的目的就是提供不同的流量模式、可配置的策略以应对这些威胁,包括 reshape 成目标网站的形状。 它们各有优劣,近期我也在研究解决一些鲜有人提及的问题。我相信 IETF 在 ECH 后至少还有一些配套的标准要制定、推动。 Thanks for the addition. Traffic analysis inside encrypted tunnels is an area that has been the subject of a lot of research, back in VLESS BETA I also briefly described the two most basic classifications:
VLESS Flow and Seed exist to provide different traffic patterns and configurable policies to address these threats, including reshaping into the shape of a target website. They each have their advantages and disadvantages, and recently I've been working on solving some of the lesser-mentioned problems. I'm sure the IETF has at least a few more standards to develop and promote after ECH. |
I have been watching MASQUE, which is developing standards for proxying (even proxying UDP and IP packets) over HTTP/3. It's used in iCloud Private Relay already. From what I have seen of their documents, there is not much in the way of protection against protocol fingerprinting. draft-ietf-masque-quic-proxy-00 briefly mentions padding, in the context of input–output correlation:
HTTP/2, QUIC, and TLS have provisions for padding, but they are kind of "inside-out" from how we would prefer them to be. It's easier for our purposes if the padding is the outer layer: padding(TLS), not TLS(padding). This recent Tor proposal is related. It doesn't have specific solutions, more of an overview of the problem (also including things like circuit tagging that are specific to Tor). "Prioritizing Protocol Information Leaks in Tor"
But I think your approach is in the right direction. We need to get away from simplistic padding schemes that only add padding or split packets, like traffic morphing and obfs4's iat-modes, and actually decouple the traffic patterns from the application that is tunneled from the traffic patterns of the tunnel itself. One question I am still struggling with is what the tunnel's traffic sending schedule should actually be; i.e., client-first or server-first, what mix of burst sizes and directionality, etc. Cf. #255 (comment). |
ideal pattern would be aligned with traffic of the target website in the case of XTLS or domain-front. However, in those deployments the proxy does not know what normal traffic to the target website looks like. If I (as Xray operator) operated the target website myself, I would have precise traffic measurements to the real website that I can extract patterns from. Therefore I think it would be ideal if the pattern itself would be part of client configuration, similar to SNI in domain-fronting, and that a protocol would leave a generous amount of API surface for custom patterns. Ideally there'd be both 1) initial pattern as part of client config for bootstrapping 2) next to transmitted data, a place for the server to tell the client which pattern to use next
I found this document with the same info, are there details on what can be (or has to be) configured? or is flow/seed still WIP |
I think this might be where ECH can actually be helpful. Precisely matching the traffic patterns of a somewhat "static" website can be challenging to get every details right. Targeting a "generic" domain used as the outer ECH SNI (e.g., cloudflare-ech.com) should be less demanding. |
Is it ok to discuss this paper publicly when it was shared privately?
The formation of the concept TLS-in-TLS in the community implied a belief in a certain heuristic that this is a more important traffic feature than others, but research in this field would typically study general traffic classification with general ML models instead of heuristics and hand-crafted classifiers, and as a result there is less insight into the explanability of the ML models on which features are the structurally more important factors and why that is the case. And the reason for this belief in the heuristic is also because it is doubly more difficulty to build a general traffic obfuscator that can reliably defeat a general traffic classifier than to build the classifier itself, as there is no publicly available general traffic classifiers in this space (the ones deployed in China, Iran, etc.) than can be used to study its behaviors and build adversarial general obfuscator model (i.e. traffic morphers, traffic shapers, etc. mentioned above) and then experiment and verify their performance against the classifiers. So instead if the most important factor on the general traffic classifiers can be identified, then it is much easier and realistic to implement specific, scope-limited countermeasures to mitigate and hinder the general traffic classifiers. |
To be fair, I was not contacted by the original authors. The authors of said paper posted a graph in a public thread. I kept looking at the graph until I found out it is related to an unpublished research. I am very sorry about leaking the details and have redacted related information from my comments.
This. SSRoT is another project that just works. Survived with glaring features like original SSR. Does it mean SSRoT's obfuscation is effective? No, it is because no one cared enough. |
昨天我和 @yuhan6665 再次联系了那篇论文的作者询问何时发布,作者说“那篇论文提交审核且通过了 不过我们还在进行一些修改 比如强调vision的结果是另一个更有针对性的classifier做出来的 以避免一些误会”,“预计在九月下旬就会发布” Yesterday, @yuhan6665 and I contacted the author of the paper again to ask when it will be released, and the author said "the paper was submitted for review and passed, but we are still making some changes, such as emphasizing that the results for vision are from another classifier that is more specific to the topic, so as to avoid some misunderstandings". "We expect it to be released in late September." |
哪国的作者 What country is the author from |
此外我想讨论一下机器学习的性能、可解释性与误报率问题,以避免一些误解。我不是这方面的专家,但我确实有过一些研究,并实际训练、部署过一些模型。 很多人对机器学习或深度学习的印象是“耗资源”,其实不完全准确。因为对于绝大多数模型,明显的“耗资源”仅是“训练”,即 不断调整各处权重 这一过程( 可解释性这个问题,主要是因为训练出来的模型可能只是“以奇怪的方式刚好 work”,不直观,人类只是知道它 work,但看不懂它是怎么 work 的,当然参数量大的话更看不懂。不过,对于相对不复杂的数据源,控制变量进行测试即可探究出它的原理。 最后是误报率,在资源相同的情况下,针对性越强误报率越低, Additionally I'd like to discuss the performance, interpretability vs. false positives of machine learning to avoid some misunderstandings. I am not an expert in this area, but I have done some research and actually trained and deployed some models. Many people have the impression that machine learning or deep learning is "resource intensive", which is not entirely accurate. Because for the vast majority of models, the obvious "resource intensive" part is only the "training"; that is, the process of constantly adjusting the weights ( Interpretability is a problem mainly because the trained model may just "work in a weird way", which is not intuitive. Humans just know it works, but they can't understand how it works, and even more so if the number of parameters is large. However, for relatively uncomplicated data sources, testing with control variables will allow you to find out how it works. Finally, regarding the false positive rate, all else being equal, the more targeted it is, the lower the false positive rate, |
我觉得 ECH 是这些技术的 IETF 版,也有一些共同的问题。从技术上来讲,一个明确的 "generic" domain 确实看起来更加合理,但从实践上来讲,这样的事情不会被 GFW 所接受。因为它绝对不会允许人们大规模地不挂任何代理而 直接浏览它想封锁的网站,否则依附于 GFW 的存在而建立起的一系列审查制度都会变成一纸空文。对 GFW 来说,若无法精准封锁,那就只剩全封这一个选择。就像 HTTP 时代它还能检测一下你的关键字,而 HTTPS 时代若你不配合屏蔽一些内容,那就会封掉你的整个域名。 I think ECH is the IETF version of these technologies, and there are some common problems. Technically, an explicit "generic" domain does seem to make more sense, but practically, something like this wouldn't be acceptable to GFW. It will never allow people to browse sites it wants to block on a large scale without any proxies, otherwise all the censorship that has been built up around the existence of GFW would be rendered useless. For GFW, if you can't block it precisely, then you have no choice but to block it all. Just like in the HTTP era, it can still detect your keywords, while in the HTTPS era, if you don't cooperate in blocking some content, it will block your whole domain. |
继续这个话题,我觉得 TLS-in-whatever 特征不一定是 more important,不过它至少比较明显、影响广泛、易于被审查者所利用:
以上这些因素叠加导致了如果我们要写一个“尽量不暴露自己是代理”的代理,就不得不以某种形式处理 TLS-in-whatever 特征。 Continuing on this topic, I don't think the TLS-in-whatever feature is necessarily more important, but it is at least more obvious, more widespread, and more easily exploited by censors:
The combination of these factors leads to the fact that if we want to write a proxy that "tries not to reveal itself as a proxy", we have to deal with the TLS-in-whatever feature in some way. |
This is a key point. The censor's goal is not to detect tunneled TLS per se—it is to detect tunnels, period. It's just that because TLS makes up such a large proportion of all traffic, whenever you have any kind of tunnel, at some point you are going to send TLS through it. If you don't do something to disguise the timing and directionality pattern of TLS, then that evidence of tunneling will show through. In other words, it's not the "TLS" part the censor cares about, it's the "tunnel" part. The "TLS" is just a handy identifier because it's so common and it has a characteristic traffic pattern. |
This is spot on. You cannot avoid generating this signature by simply not using TLS.
It would be great if browsers could share some responsibility with proxies. Imagine a GREASE extension that could fit into any TLS packet type, serving only to inflate packet sizes. But I guess as long as TLS-in-TLS remains a "niche" security concern affecting only users from select regions, such an initiative may remain unlikely. |
TLS does have a built-in feature to pad records that are encrypted: https://www.rfc-editor.org/rfc/rfc8446.html#section-5.2
https://www.rfc-editor.org/rfc/rfc8446#section-5.4
I don't think you can use this record padding feature in the Client Hello, but there you can use the padding extension:
Here's @ValdikSS's past demonstration of using the padding extension to get around a filter: https://ntc.party/t/http-headerstls-padding-as-a-censorship-circumvention-method/168 That was intended for when the TLS is sent without a tunnel around it, but it could also work to break up the traffic signature. Unfortunately, merely adding padding doesn't change the overall directionality of bursts. I'm not sure if it's possible in TLS to, for example, send "no-op" records before the handshake, and in any case changing the directionality would likely require cooperation from the server. |
I was referred to a paper from this year's SIGCOMM While it doesn't look at TLS-in-TLS specifically, section 7 explores encrypted flow classification, with a subsection on how to classify SMTP flows when tunneled within TLS.
The methodology proposed in the paper might provide insights into detecting TLS within TLS. The basic premise is to train a classifier on plaintext protocols (like plain TLS) using features that remain stable/visible post-encryption such as packet sizes, direction, and timing. This classifier can then be applied to the payload part of encrypted flows. It seems that their classifiers achieved pretty decent precision for detecting SMTP-in-TLS.
|
And HTTP/2 the protocol also has builtin padding fields, but among the implementations of the two protocols paddings are mostly an afterthought and it's annoying to try to create paddings through the existing APIs of these implementations without patching their cores. In terms of sustaining long-term maintenance, I'd prefer not to use the forgotten builtin paddings.
I have this intuition so far: The tunnel's traffic schedule should parrot what the tunnel "should" look like as if it is not a tunnel. As an example, I have a HTTP/2 tunnel that sends The struggle is probably of coming up with a general traffic schedule, but if there is a more specific scope for parroting, it is easier to narrow down the target distribution. But not too specific, this is not the classic definition of parroting, as we not are parroting a particular application or protocol in terms of their structures, but their traffic distributions. The straightforward and brutal force way would be to train a generative model given sufficient data of an entire class of target traffic, and use that to generate the traffic schedule you want. But I hope there are cheaper heuristics that just raise the floor of detection high enough to achieve circumvention. Edit: One more thing. I believe the traffic schedule should be more general than site-based. Re:
There are several issues with this level of specificity. It's not economical to require every operator to generate their own traffic schedules as it requires highly automated tooling for generation (which OS, which browser, generate a schedule per OS/browser? What about update or concept drift?), and more tools for verifying the generated schedules are actually ok (a. The operator can inadvertently generate a traffic profile of e.g. Google.com that is known to every website fingerprinter if they choose to mirror it; b. Is even possible to have this kind of adversarial tools). The schedules generated from the data of one website, due to its limited scope, may be too specific to risk becoming the classic parrot. |
My thoughts are in this direction as well. In website fingerprinting research they always try to quantify the overhead: how much the defense costs, in terms of bandwidth and latency. But for our purposes, it's likely that the very beginning of a connection is, by far, the most important. (Probably just the first few packets, even.) If we traffic-shape just, say, the first 10 KB of a connection in both directions, and revert to "natural" shaping after that, that's likely to put us ahead of the game for a long time, and the overhead will be asymptotically negligible. Rather than first trying to figure out the question of what a traffic schedule should look like, I'm thinking about the possibility of defining a "challenge" with a few simple schedules for circumvention developers to try implementing. These would not be "strawman" schedules, not designed for effective circumvention, but just to give developers a common target to work towards (as I expect implementing even these will require some internal code restructuring). When a few project have developed the necessary support for shaping traffic according to a schedule, then it will be easier to experiment with alternatives. This is the kind of thing I am thinking of:
|
@wkrp thanks for the writeup. Seems a good coding task for us. So far we tried to implement a simple and efficient structure in Xray for padding and shaping the first few packets. It only account for number of packets. In the future, we should add more state of traffic like bytes received. We are currently looking to release the customization capability of these schedules to user. I wonder what is the suitable way/level of config. (@RPRX thought of putting everything into a "seed" like Minecraft) I also find your choice of specific number interesting. Some number I roughly get, like 1400, ping 20 pong 40 are common traffic patterns. What about {120, 170, 250} {250, 255, 270}? Also what is the probability meaning for choosing Beta distribution? |
The numbers are just some numbers I made up. There's no meaning to them—I was just trying to think of some schedules that might pose some design challenges. Please do not take the ideas I sketched as recommendations for good traffic schedules. They are bad traffic schedules, in fact. I am just brainstorming ways to make progress towards general and effective traffic shaping. My thinking is that there are two obstacles: (1) current systems need to be rearchitected to be more flexible in the kind of traffic shaping they support, and (2) we need to find out what traffic schedule distributions are practical and effective. I find myself thinking about (2) perhaps too much (as in #281 (comment)), and I reflected that a more productive path forward may be to get more developers thinking about (1). We can tackle problem (1) first, targeting artificial "strawman" traffic schedules; then we'll have the infrastructure necessary to comfortably experiment with problem (2). My idea was that posting a list of concrete "challenges", we can get everyone working on a common problem and thinking about the issues involved. I didn't intend #281 (comment) to be a final list of recommendations. I think it should get some more thought. But a list of traffic shaping challenges could look something like that. The beta distribution is just from my intuition that uniform distributions are maybe not the best for natural-looking traffic features. But it's not important: the goal is not to prescribe a specific algorithm for implementation, it's to demonstrate that the software can handle different kinds of distributions. You can replace it with a uniform distribution or whatever. These are not recommendations for anything to be shipped to users, at this point. What I mean when I talk about design questions involved in traffic shaping, is that implementing even a simple traffic schedule requires at least two things:
These two things are what is required to move beyond simplistic, one-packet-at-time padding, and really decouple the observable traffic features of the tunnel from the traffic features of the application protocol inside the tunnel. Implementing this properly may require you to turn the main loop of your program "inside-out". I wrote about this in the past and made a sample patch for obfs4proxy: https://lists.torproject.org/pipermail/tor-dev/2017-June/012310.html
Also compare to the discussion in the recent "Security Notions for Fully Encrypted Protocols":
In their sample protocol of Figure 1, |
The paper you are seeking about TLS-in-TLS detection is released in USENIX 2023: Source: Xray Telegram Group |
Thanks for sharing. Diwen Xue also recommended another paper of interest |
The recommendation suggests that even a detector with less than practical precision/false positive rate can not be underestimated, because it will intuitively become more powerful when it gets aggregated in coarse grained analysis. So obfuscation strength matters quantitatively and it's always useful to increase it. A simple countermeasure to host-based analysis is to insert dummy flows at host level. But it may be logistically difficult to generate diverse traffic from diverse sources to the circumvention bridge at low cost. |
There is a thread now for this paper, with a summary: #312. |
related to #280, it got me thinking about TLS-in-TLS: I wonder if ECH and/or GREASE in the inner layer would (temporarily) confuse those heuristics.
but then, i realized, i have no idea how TLS-in-TLS is detected today. is there an (english) summary of it?
I only see some vague allusions towards it at XTLS/Xray-core#1295 (google translate) which talks about packet sizes and timings being detectable via machine learning. But it's not very concrete about how this detection works (I think it's a good read regardless)
The text was updated successfully, but these errors were encountered: