Stanford Security Lunch
Summer 2022

Get announcements:

June 22, 2022 Canceled due to campus power outage

June 29, 2022 Building Effective Differentially Private Language Models

Speaker:  Xuechen (Chen) Li

Abstract:  Large neural language models have demonstrated impressive abilities in tasks involving text and have become the powerhouse for many industry applications of NLP. At the same time, such models can memorize and regurgitate training data that contains sensitive information. This presents a serious privacy risk for their deployment. Differential privacy is a formal guarantee, but its enforcement in machine learning and deep learning algorithms has historically led to significant levels of performance degradation. In this talk, I will present recent work with collaborators where we demonstrated one of the first successes of building large differentially private language models with performance approaching previous non-private models. Ideas in this work have formed the building blocks of recent attempts at building privacy-preserving NLP applications at Microsoft.
The basic DP-SGD algorithm we adopt in our work was theoretically shown to inevitably exhibit bad dimension-dependent error degradation for convex and Lipschitz losses in a mini-max sense. Yet, our empirical results suggested that good performance can nevertheless be attained even when one optimizes with respect to hundreds of millions of parameters. This presents a seeming discrepancy between theory and practice. In the second part of the talk, I will present another work in which collaborators and I theoretically show that vanilla DP-SGD can obtain dimension-independent errors if the loss function satisfies additional conditions. Empirically, we present experimental results in language modeling fine-tuning settings that suggest these additional conditions are likely met in practice.

July 06, 2022 Privacy Threat Modeling

Speaker:  Cara Bloom (MITRE)

Abstract:  This applied research talk will discuss the privacy threat modeling gap, challenges and opportunities of privacy threat modeling in practice, and a new qualitative threat model currently under development. In privacy risk management, there are well-respected methods for modeling vulnerabilities and consequences (or harms), but there is no commonly used model nor lexicon for characterizing privacy threats. We will discuss the gap in privacy risk modeling, how privacy threat-informed defense could better protect systems from privacy harms, and a working definition for a “privacy attack.” Then we will present a draft qualitative threat model – the Privacy Threat Taxonomy – developed to fill this gap in privacy risk modeling. This model was generated iteratively and collaboratively using a dataset of almost 150 non-breach privacy events, which includes directed, accidental, and passive attacks on systems. We will also discuss how practitioners can incorporate a threat model into their privacy risk management program.

July 13, 2022 Waldo: A Private Time-Series Database from Function Secret Sharing

Speaker:  Emma Dauterman (Berkeley)

Abstract:  Applications today rely on cloud databases for storing and querying time-series data. While outsourcing storage is convenient, this data is often sensitive, making data breaches a serious concern. We present Waldo, a time-series database with rich functionality and strong security guarantees: Waldo supports multi-predicate filtering, protects data contents as well as query filter values and search access patterns, and provides malicious security in the 3-party honest-majority setting. In contrast, prior systems such as Timecrypt and Zeph have limited functionality and security: (1) these systems can only filter on time, and (2) they reveal the queried time interval to the server. Oblivious RAM (ORAM) and generic multiparty computation (MPC) are natural choices for eliminating leakage from prior work, but both of these are prohibitively expensive in our setting due to the number of roundtrips and bandwidth overhead, respectively. To minimize both, Waldo builds on top of function secret sharing, enabling Waldo to evaluate predicates without client interaction. We develop new techniques for applying function secret sharing to the encrypted database setting where there are malicious servers, secret inputs, and chained predicates. With 32-core machines, Waldo runs a query with 8 range predicates over records in 3.03s, compared to 12.88s for an MPC baseline and 16.56s for an ORAM baseline. Compared to Waldo, the MPC baseline uses 9 − 82× more bandwidth between servers (for different numbers of records), while the ORAM baseline uses 20 − 152× more bandwidth between the client and server(s) (for different numbers of predicates). This talk is based on joint work with Mayank Rathee, Raluca Ada Popa, and Ion Stoica that appeared at IEEE S&P'22.

July 20, 2022 A Tale of Two Markets: Investigating the Ransomware Payments Economy

Speaker:  Jack Cable

Abstract:  Ransomware attacks are among the most severe cyber threats. They have made headlines in recent years by threatening the operation of governments, critical infrastructure, and corporations. Collecting and analyzing ransomware data is an important step towards understanding the spread of ransomware and designing effective defense and mitigation mechanisms. We report on our experience operating Ransomwhere, an open crowdsourced ransomware payment tracker to collect information from victims of ransomware attacks. With Ransomwhere, we have gathered 13.5k ransom payments to more than 87 ransomware criminal actors with total payments of more than $101 million. Leveraging the transparent nature of Bitcoin, the cryptocurrency used for most ransomware payments, we characterize the evolving ransomware criminal structure and ransom laundering strategies. Our analysis shows that there are two parallel ransomware criminal markets: commodity ransomware and Ransomware as a Service (RaaS). We notice that there are striking differences between the two markets in the way that cryptocurrency resources are utilized, revenue per transaction, and ransom laundering efficiency. Although it is relatively easy to identify choke points commodity ransomware payment activity, it is more difficult to do the same for RaaS.

July 27, 2022 Reflections on Trusting Identifiers: The Foundations of Social Engineering

Speaker:  Zane Ma (Georgia Tech)

Abstract:  Since first gaining popularity in the late 1990s, social engineering such as phishing has stubbornly persisted as one of the most prevalent forms of online crime. Moreover, social engineering is often an entry point for additional attacks such as ransomware, trojans, APTs, and other malware. Social engineering is not a new phenomenon and has been studied for over 20 years, so why does it remain one of the most prevalent security issues? In this talk, we posit that security researchers and practitioners have largely focused on symptoms of social engineering (e.g. identifying phish-y domain names, performing content analysis on phishing emails/sites, etc.) rather than its root cause: misplaced user trust. Social engineering only succeeds when 1) a user mis-identifies a malicious online entity for a trusted one or 2) when a user misplaces their trust in a new, unknown entity.
This talk discusses two usability challenges related to web identity and trustworthiness, the underpinnings of all social engineering. The first study explores how and why users mis-identify online entities through a usability study of the most ubiquitous online identifier: the URL. The second study measures the impact of HTTPS/TLS---which only provide connection privacy and authenticity---as misleading indicators of trustworthiness. Ultimately, the talk proposes future directions for social engineering research centered around usable identifiers and trust systems.

August 03, 2022 Hate Raids on Twitch: Echoes of the Past, New Modalities, and Implications for Platform Governance

Speaker:  Catherine Han

Abstract:  In the summer of 2021, users on the livestreaming platform Twitch experienced a wave of "hate raids," an attack that overwhelms a target's chatroom with hateful messages --- often with the aid of bots and automation. Utilizing a mixed methods approach, we combine interviews with both streamers and third-party bot developers with a quantitative measurement of this phenomenon across the platform. We find evidence that hate raids are highly-targeted, hate-driven attacks. We also observe another mode of hate raid similar to trolling and networked harassment. We show that the content of these hate raid messages are most commonly rooted in anti-Black racism and antisemitism, and we observe that the threat of these attacks elicited both proactive and rapid, reactive community responses. These results have implications for defenses in livestreaming platform design to better prepare for the experiences of at-risk communities and the division of labor between community moderators, tool-builders, and platforms.

August 10, 2022 USENIX Security

August 17, 2022 CRYPTO