Stanford Security Lunch
Summer 2021

Get announcements:


June 16, 2021 FHE Development Ecosystem: Tools, Compilers & Challenges

Speaker:  Alexander Viand (ETH Zürich)

Abstract:  Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. Hence, it provides protection in situations where computations are carried out by an untrusted party. This powerful concept was first conceived by Rivest et al. in the 1970s. However, it remained unrealized until Craig Gentry presented the first feasible FHE scheme in 2009. Since then, FHE has gone from theoretical breakthrough to practical deployment. However, developing FHE systems remains complex, requiring expert knowledge.
In this talk, I walk through the inherent engineering challenges in developing FHE applications and discuss how tools like compilers that translate between standard programs and FHE implementations can step in to address some of these complexities. I will discuss recent work in this space and, using different case study applications that represent common aspects of FHE applications, highlight where barriers to entry have been successfully lowered and where they still remain. I will conclude by showing examples of what non-expert developers can achieve today and outline future directions for FHE compiler development.

Paper:  2021 IEEE Symposium on Security and Privacy, arXiv

June 23, 2021 LZR: Identifying Unexpected Internet Services

Speaker:  Liz Izhikevich (Stanford)

Abstract:  Internet-wide scanning is a commonly used research technique that has helped uncover real-world attacks, find cryptographic weaknesses, and understand both operator and miscreant behavior. Studies that employ scanning have largely assumed that services are hosted on their IANA-assigned ports, overlooking the study of services on unusual ports. In this work, we investigate where Internet services are deployed in practice and evaluate the security posture of services on unexpected ports. We show protocol deployment is more diffuse than previously believed and that protocols run on many additional ports beyond their primary IANA-assigned port. For example, only 3% of HTTP and 6% of TLS services run on ports 80 and 443, respectively. Services on non-standard ports are more likely to be insecure, which results in studies dramatically underestimating the security posture of Internet hosts. Building on our observations, we introduce LZR (“Laser”), a system that identifies 99% of identifiable unexpected services in five handshakes and dramatically reduces the time needed to perform application-layer scans on ports with few responsive expected services (e.g., 5500% speedup on 27017/MongoDB). We conclude with recommendations for future studies.

Paper:  USENIX Security 2021

June 30, 2021 TLS Encrypted ClientHello

Speaker:  Chris Wood (Clouflare)

Abstract:  TLS is one of the more important security protocols used on the Internet today. TLS 1.3, the latest version, brought significant performance, security, and, importantly, privacy improvements to the protocol. This version encrypts more sensitive data during the connection establishment phase of the protocol to limit what eavesdroppers can observe. However, one crucial protocol message remains unencrypted: the ClientHello. This message may carry sensitive information such as the destination a client is connecting to (example.com), supported application protocols, and even per-client authentication information. To keep this information equally safe from eavesdroppers, the TLS working group in the IETF is currently standardizing a mechanism to encrypt the entirety of the ClientHello, called TLS Encrypted ClientHello (ECH). In this talk, I'll give an overview of the ECH standardization effort in recent years, describing some of the flawed designs and active attacks discovered along the way, and discuss the current state of the protocol and its security analysis.

July 07, 2021 Addra: Metadata-private voice communication over fully untrusted infrastructure

Speaker:  Ishtiyaque Ahmad (UCSB)

Abstract:  Metadata from voice calls, such as the knowledge of who is communicating with whom, contains rich information about people's lives. Indeed, it is a prime target for powerful adversaries such as nation states. Existing systems that hide voice call metadata either require trusted intermediaries in the network or scale to only tens of users. In this talk, we will discuss the design, implementation, and evaluation of Addra, the first system for voice communication that hides metadata over fully untrusted infrastructure and scales to tens of thousands of users. At a high level, Addra follows a template in which callers and callees deposit and retrieve messages from private mailboxes hosted at an untrusted server. However, Addra improves message latency in this architecture, which is a key performance metric for voice calls. First, it enables a caller to push a message to a callee in two hops, using a new way of assigning mailboxes to users that resembles how a post office assigns PO boxes to its customers. Second, it innovates on the underlying cryptographic machinery and constructs a new private information retrieval scheme, FastPIR, that reduces the time to process oblivious access requests for mailboxes. An evaluation of Addra on a cluster of 80 machines on AWS demonstrates that it can serve 32K users with a 99-th percentile message latency of 726 ms—a 7 times improvement over a prior system for text messaging in the same threat model.

July 14, 2021 CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU

Speaker:  Jeffrey Tan (UVA)

Abstract:  In this talk, I will present CRYPTGPU, a system for privacy preserving machine learning that implements all operations on the GPU (graphics processing unit). Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. In this work, we start by introducing a new interface to losslessly embed cryptographic operations over secret-shared values (in a discrete domain) into floating-point operations that can be processed by highly-optimized CUDA kernels for linear algebra. We then identify a sequence of “GPU-friendly” cryptographic protocols to enable privacy-preserving evaluation of both linear and nonlinear operations on the GPU. Our microbenchmarks indicate that our private GPU-based convolution protocol is over 150× faster than the analogous CPU-based protocol; for non-linear operations like the ReLU activation function, our GPU-based protocol is around 10× faster than its CPU analog.
With CRYPTGPU, we support private inference and private training on convolutional neural networks with over 60 million parameters as well as handle large datasets like ImageNet. Compared to the previous state-of-the-art, when considering large models and datasets, our protocols achieve a 2× to 8× improvement in private inference and a 6× to 36× improvement for private training. Our work not only showcases the viability of performing secure multiparty computation (MPC) entirely on the GPU to enable fast privacy-preserving machine learning, but also highlights the importance of designing new MPC primitives that can take full advantage of the GPU’s computing capabilities.

July 21, 2021 Data Poisoning Won’t Save You From Facial Recognition

Speaker:  Evani Radiya-Dixit (Stanford)

Abstract:  Data poisoning has been proposed as a compelling defense against facial recognition models trained on Web-scraped pictures. By perturbing the images they post online, users can fool models into misclassifying future (unperturbed) pictures.
We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models––including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.
We evaluate two systems for poisoning attacks against large-scale facial recognition, Fawkes (500,000+ downloads) and LowKey. We demonstrate how an "oblivious" model trainer can simply wait for future developments in computer vision to nullify the protection of pictures collected in the past. We further show that an adversary with black-box access to the attack can
(i) train a robust model that resists the perturbations of collected pictures and
(ii) detect poisoned pictures uploaded online.
We caution that facial recognition poisoning will not admit an "arms race" between attackers and defenders. Once perturbed pictures are scraped, the attack cannot be changed so any future successful defense irrevocably undermines users' privacy. Thus, we argue that legislative rather than technological solutions are needed to counteract privacy-invasive facial recognition systems.

July 28, 2021 Protecting Applications and Data on AWS EC2 using Amazon Nitro Enclaves

Speaker:  Yan Michalevsky (Anjuna Security)

Abstract:  Sensitive applications and data in the cloud are threatened by insider access. Insiders can include the cloud provider employees, the company's admins, or third-parties such as the government or attackers that gain access to the cloud VM instances where they can use privileged access to the data and applications. Even when data is encrypted, it has to be decrypted and processed at some point, and the keys to it are present in memory.
Amazon Nitro Enclaves have been announced and launched over the past year to provide a Trusted Execution Environment on EC2. Enclaves are completely isolated from the parent VM and can provide a safe-haven for processing sensitive data. In this talk, we examine the technology, and show how it can address some of the most acute security problems nowadays, with minimal performance and usability trade-offs.

August 04, 2021 Zeph: Cryptographic Enforcement of End-to-End Data Privacy

Speaker:  Nicolas Küchler (ETH Zürich)

Abstract:  As we increasingly expose sensitive data to gain valuable insights, the need to natively integrate privacy controls in data analytics frameworks is growing in importance. Today, privacy controls are enforced by data curators with full access to data in the clear. However, a plethora of recent data breaches shows that even widely trusted service providers could be compromised. Additionally, there is no assurance that data processing and handling comply with the claimed privacy policies. This motivates the need for a new approach to data privacy that can provide strong assurance and control to users.
In this talk, I'll present Zeph, a system that enables users to set privacy preferences on how their data can be shared and processed. Zeph enforces privacy policies cryptographically and ensures that data available to third-party applications complies with users' privacy policies. Zeph executes privacy-adhering data transformations in real-time and scales to thousands of data sources, allowing it to support large-scale low-latency data stream analytics. We introduce a hybrid cryptographic protocol for privacy-adhering transformations of encrypted data. We develop a prototype of Zeph on Apache Kafka to demonstrate that Zeph can perform large-scale privacy transformations with low overhead and latencies.

Paper:  USENIX OSDI 2021

August 11, 2021 USENIX Security 2021

August 18, 2021 CRYPTO 2021

August 25, 2021 On the Infrastructure Providers that Support Misinformation Websites

Speaker:  Catherine Han (Stanford)

Abstract:  In this paper, we analyze the service providers that power 440 misinformation sites, including hosting platforms, domain registrars, DDoS protection companies, advertising networks, donation processors, and e-mail providers. We find that several providers are disproportionately responsible for hosting misinformation websites compared to mainstream websites. Most prominently, Cloudflare offers DDoS protection to 34.3% of misinformation sites while servicing only 17.9% of mainstream websites in our corpus. While many mainstream providers continue to service misinformation sites, we show that when misinformation and other abusive websites are removed by hosting providers, DDoS protection services, and registrars, these sites nearly always resurface after finding alternative providers. More encouragingly, we show that misinformation sites also disproportionately rely on popular ad networks and donation processors, but that anecdotally, sites struggle to remain online when mainstream monetization channels are severed. We conclude with insights for infrastructure providers and researchers to consider for stopping the spread of online misinformation.

Preprint:  https://zakird.com/papers/misinfo-infra-preprint.pdf

September 01, 2021 Building E2EE and User Identity

Speaker:  Merry Ember Mou (Zoom Video Communications)

Abstract:  In October 2020, Zoom launched end-to-end encryption (E2EE) for our video conferencing platform. Integrating E2EE into an existing widely-used system has required particular consideration of architectural constraints, user expectations in the UI/UX, and product requirements. In this talk, I will highlight some of the design and implementation objectives and challenges from our initial release. In addition, I will describe our subsequent phases of improvements to E2EE, in which we are building a user-friendly notion of identity that is externally auditable and backed by a trusted third-party identity provider. With each phase, we aim to make verifying meeting participants' identities (the "ends" of E2EE) as intuitive as possible.

September 08, 2021 No Calm in the Storm: Investigating QAnon and Conspiracy Websites Relationships

Speaker:  Hans Hanley (Stanford)

Abstract:  QAnon is a far-right conspiracy theory whose followers largely organize online. In this work, we use web crawls seeded from two of the largest QAnon hotbeds on the Internet, Voat and 8kun, to build a hyperlink graph. We then use this graph to identify, understand, and learn from the websites that spread QAnon content online. We curate the largest list of QAnon centered websites to date, from which we document the types of QAnon sites, their hosting providers, as well as their popularity. We further analyze QAnon websites’ connection to mainstream news and misinformation online, highlighting the outsized role misinformation websites play in spreading the conspiracy. Finally, we leverage the observed relationship between QAnon and misinformation sites to build a random forest classifier that distinguishes between misinformation and authentic news sites, getting a performance of 0.98 AUC on a test set. We further show the generalizability of utilizing other easily identifiable conspiracy-oriented websites to then label misinformation. Our results demonstrate new and effective ways to study conspiracy and misinformation on the Internet.