Introduction to Privacy Enhancing Cryptographic Techniques: Secure Multiparty Computation

Securely combining data from multiple sources while preserving privacy

By: Betty Ann Bryanton, Canada Revenue Agency

Introduction

The increasing prevalence of technologies, such as cloud, mobile computing, machine learning (ML), and the Internet of Things (IoT), create opportunities for innovation and information sharing, but also create challenges for data security and privacy. These challenges have been amplified during the global pandemic, working from home has driven faster adoption of hybrid and cloud services. This situation has strained existing security capabilities and exposed gaps in data security (Lowans, 2020). Meanwhile, global data protection legislation is maturing, and every organization that processes personal data faces higher levels of privacy and non-compliance risks than ever before (Wonham, Fritsch, Xu, de Boer, & Krikken, 2020).

As a result, privacy-enhanced computation techniques, such as Secure Multiparty Computation, which protect data while it is being usedFootnote1, have been gaining popularity.

What is Secure Multiparty Computation?

Secure Multiparty Computation (SMPC) is a technique for combining information from different privacy zones to obtain insights on the combined data without having to reveal the raw data to the involved parties. It has evolved from a theoretical curiosity introduced by Andrew Yao's Millionaires problemFootnote2 in the 1980s to an important tool for building large-scale privacy-preserving applications.Footnote3

To illustrate the concept, Bob and Alice want to know if they are being paid the same but do not want to ask this awkward question. They buy four lockable suggestion boxes, each labelled with a dollar amount per hour: 10, 20, 30, 40. Bob earns $20/hr, so he only has a key to unlock the box labelled 20. Alice earns $30/hr; she only has a key to unlock the box labelled 30. Both Bob and Alice, unseen to the other, puts a slip of paper in each box indicating 'yes' or 'no.' For example, Alice puts 'no' into 10, 20, 40 and 'yes' into 30 for the $30/hr she makes. Bob unlocks the 20 box and learns that Alice is not paid $20/hr, but still does not know if her hourly rate is $10, $30 or $40. Alice unlocks the 30 box and learns that Bob does not make $30 an hour but does not know if his hourly rate is $10, $20 or $40. -- This is called 'oblivious transfer.' The ability to do oblivious transfers is the basis for performing SMPC.Footnote4

SMPC is a method of distributed computing and cryptographyFootnote5 that combines data transformation (encryption) with specialized software. It enables multiple parties who do not trust each other, or any common third party, to jointly work with data that depends on all of their private inputs while keeping that data encrypted. Participants know only the results of the collaboration, and not the specific data others contributed. This enables collaboration between trusted partners or even between competitors.

SMPC is often assumed to require the participation of multiple organizations; however, the specific requirement is for multiple privacy zones, i.e., two or more domains with different sets of privacy restrictions. Multiple privacy zones exist across multiple organizations with independent data owners, but they may also exist within a single organization across teams, departments, and/or jurisdictions.

Parties are trusted to adhere to the protocol. If a party is not trusted, additional measures, outside the scope of this paper, are required to prevent malicious or covert breach attempts.

Strengths

  • Simultaneously achieves privacy, obliviousness, and authenticity
    • Eliminates the need to trust a third-party data broker to access and process the data
    • Allows inference on encrypted data: the model owner never sees the client's private data and therefore cannot leak or misuse it
  • Eliminates trade-off between data usability and data privacy, i.e., since the raw data is encrypted, there is no need to mask or drop any features in order to share and process it
  • Opens new opportunities for enterprise collaborations that were not previously possible due to regulation or risk
  • Confidentiality levels similar to Fully Homomorphic Encryption (FHE) but less computationally expensive and complex

Challenges

  • SMPC techniques are extremely complex, requiring comprehensive, often complicated, cryptography; thus, it is difficult for non-experts to understand or implement.
  • Inability to see the input data may foster suspicion
  • If functions are not carefully crafted and tested, security can be broken
  • Significant computational overhead due to the complexity and distributed nature. Cost varies greatly depending on the collaboration required (e.g., number of parties, usage of different cloud providers) and the need for protection against malicious parties
  • Sensitive to latency between nodes (Krikken, 2019)
  • Requires additional infrastructure, which will add to the project planning and total cost of ownership calculation (Byun, 2019)

Why is it important?

According to the U.S. Director of National Intelligence, U.S. cybersecurity in both the public and private sectors is at continual risk and should expect increasing attacks. Organizations rich with data and intellectual property (IP) are prime targets. Attackers often target this 'Crown Jewel'Footnote6 data because of its value and the potential for disruption (Enveil).

Organizations are increasingly concerned about data security in several scenarios, including:

  • collecting and retaining sensitive personal information.
  • processing personal information in external environments, such as the cloud; and
  • information sharing, such as sharing and working on sensitive data in distributed settings, from healthcare to finance (Krikken, 2019).

SMPC can address and alleviate these concerns, by allowing organizations to compliantly, securely, and privately share insights on distributed data without ever exposing or moving it.

This is important because the increasingly distributed nature of customer data means many organizations do not generate the necessary levels of data on their own to derive the unbiased insights required to provide new experiences, open new revenue streams and apply new business models. SMPC enables secure collaboration to provide mutual benefit to all parties, while preserving privacy and confidentiality.

Real World Applications

Though it is still emerging and there are challenges, SMPC is poised to significantly disrupt the enterprise data exchange space and to allow successful data sharing solutions amongst distrusting data owners. Listed below are notable successful deployments.Footnote7

  • Danish Sugar Beets Auction, the first successful example of SMPC deployment, in 2008, where the privacy of farmer bids for contracts was assured
  • Boston Women's Workforce CouncilFootnote8 Gender / Wage Gap Studies, first conducted in 2016, analyzing payroll data from multiple employers, to serve as a roadmap for change for the city and its employers
  • Estonian government study in 2015, analyzing tax and education records to determine if working part-time while studying increased failure ratesFootnote9

Use Cases

SMPC is very popular for use cases where organizations need to share data with, and/or analyze data from, multiple parties without disclosing their data and/or their analytics model to each other.

This list illustrates the range and scale of SMPC applications.

  • Collaboration with disparate parties, e.g., sharing citizen data amongst government departments and/or financial institutions; sharing electronic medical records amongst hospitals, pharmacies, insurance manufacturers
  • Distributed data mining: collecting private data from independent data sources to learn something that is not possible from a single source, e.g., finding fraudulent taxpayers via private business data or other taxpayer data
  • Key management: safeguarding authentication keys as they are being used
  • Cloud computing: data exchange, data analytics, and ML across multiple, unknown cloud providers
  • Multi-network security monitoring across entities to aggregate private data
  • Spam filtering on encrypted email
  • Medical discovery, e.g., disease or virus contact tracing apps, combining data of many hospitals for genomics research
  • Satellite collision avoidance without disclosing its location

Conclusion

The awareness that personal data can be compromised in a data breach or can be abused by companies whose interests do not align with those of their users, is increasing. New regulations make holding personal data a liability risk for companies. SMPC has emerged as a powerful and versatile technique to gain insights from sharing data without ever exposing it directly.

Although there is no single product or technique that can satisfy every data security requirement, SMPC can be used as one defense alongside other data protection measures, such as data masking, and other privacy-preserving techniques, such as differential privacy and homomorphic encryption.

What's Next?

Gartner expects SMPC to be transformational in the next 5-10 years (Lowans, 2020). In order to be prepared, considering the amount of private data that many organizations hold, and the pressure for that data to be safeguarded, an interested organization should steadily continue to research SMPC and other privacy-preserving data protection techniques.

Related Topics: data anonymization, differential privacy, homomorphic encryption, trusted execution environments / confidential computing, federated learning

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

References