This summer saw an increased need for flexible services that could be accessed outside of traditional networks and scale rapidly, all while maintaining the security of information entrusted to the public service. The opportunity for data science to provide timely insights to help decision makers and the public alike has never been so great, but at the same time data scientists need to be able to ensure data and workflows operate in secure environments. The use of cloud computing has obvious benefits to data scientists, and recent developments in Government of Canada (GC) policy and cloud services made available through Shared Services Canada have made it possible to provide even greater benefits through the use of cloud services for protected workloads.
New cloud policy directives
The GC initiated the adoption of public cloud infrastructure as early as 2014. At the time, the policy on the use of cloud was unclear. It was considered a high-risk proposition to put any protected information on the cloud, so only data science projects using unclassified data could be performed.
In response to the lack of clear direction on the use of public cloud, the Cloud Adoption Strategy was developed in 2016. Along with the Direction on the Secure Use of Commercial Cloud Services and the Direction for Electronic Data Residency, both released in 2017, it became clear how to make use of public cloud infrastructure for unclassified workloads in a way that aligned with GC policy. Starting in 2018, the GC adopted a cloud-first policy stance, and began to put the groundwork in place for the adoption of public cloud services for protected workloads. The Cloud Services Framework Agreements from Shared Services Canada and the newly released Directive on Service and Digital provide the final pieces of policy direction for departments to move workloads up to Protected B, Medium Integrity, Medium Availability (PBMM).
At this point, not using cloud infrastructure requires an exception at the GC Enterprise Architecture Review Board. The roadblocks to the use of cloud infrastructure and highly-distributed data processing have been removed at the policy level, and data science teams can work with their IT services to leverage cloud to effectively support their workloads.
Data residency vs. data sovereignty
The residency of data refers to the physical or geographical location of an organization’s digital information while at rest. It is the responsibility of the departmental Chief Information Officer (CIO) to ensure that Protected B data has geographic residency in Canada, thus ensuring the data are subject to the protections afforded by Canadian laws. It does not apply to data while they are in transit.
Data sovereignty relates to other nations wishing to apply their laws to Canadian data, irrespective of where the data are residing geographically. This covers access to the data both while in transit and at rest. The question of sovereignty is one of risk, and for this reason the whitepaper on Data Sovereignty and Public Cloud was produced.
While there is an expectation that the vast majority of protected data would stay in Canada, there are provisions for considering options when this may not be possible. CIOs are responsible for evaluating options against a set of criteria, the minimum of which are:
- reputation of the department and GC
- legal aspects and agreements
- business value provided by the service
- market availability
- technical capabilities
The distinction between data at rest and data in transit is important for data science workloads, as some aspects of the cloud providers’ services may run outside your preferred data storage region. Allowing data to be transmitted securely across geographic regions may be the difference between using a pre-built machine learning service and having to build your own. Whether or not using these types of services is beneficial to the project needs to be assessed on a case by case basis with the business owner. Understanding the data flows and risks associated with using different platforms and tools is an important step to get projects deployed into production.
Building on a secure, compliant foundation
Under the traditional IT infrastructure deployment, meeting organizational compliance requirements can take a significant amount of time. This often resulted in delays to the delivery of systems, slowing down the pace of business units. Making matters more difficult for data scientists, compliance requirements vary and evolve over time. It takes a dedicated professional to keep up with them. Developing and maintaining a controlled environment requires an ongoing investment at multiple levels of the IT stack. The adoption of public cloud infrastructure allows the GC to inherit from the provider’s implementation of global security and compliance controls, helping to ensure high standards of privacy and data security.
Public cloud providers also often have integrated security services, allowing aspects of monitoring and security to be automated by the appropriate unit in your organization. This not only reduces the effort required to configure aspects of the security infrastructure, but supports the organization in a timely response to events that reduces overall risk. By adopting multiple independent layers of security the momentum and effectiveness of an attack is decreased, and the effort required to mount a successful attack becomes difficult and costly. Setting up infrastructure in this way also allows data scientists to work closely with IT and security partners while allowing everyone to focus on their specialty, and helps reduce the overall time required to put products into production.
The shared security model
Using public cloud infrastructure introduces the concept of a shared security model, in which the cloud provider is responsible for security of the cloud, and the department is responsible for security in the cloud. This means that the cloud provider will ensure that their facilities and services are secure up to the point when the departments start using and configuring the services provided. Exactly which aspects of the services are the responsibility of which group depends on how the department uses the services.
A preliminary set of baseline controls is provided through the GC Cloud Guardrails, which help to ensure that cloud-based environments are protected upon receipt of an enrollment under the GC Cloud Services Framework Agreement. Work is also actively underway to help automate the implementation of the guardrails across multiple cloud providers, helping to ensure consistency and successful implementation in a rapid service delivery window. With the baseline set of controls in place and the deployment of new infrastructure configured automatically, data scientists can work with their IT partners to leverage common configurations which help deploy their workloads quicker while assuring the client their data are secure.
Similar to how the responsible use of cloud infrastructure requires a shift in how application architecture is implemented, a shift in security control implementation is required as well. The basic set of requirements are the same, but cloud providers can show who made what change from where. This allows data scientists to focus on deploying high-performing models, while security personnel can detect misconfigurations and noncompliance, and respond quickly to prevent risks from materializing.
Cloud security vision for the Canadian public sector
The Canadian Centre for Cyber Security (CCCS) provides a means to watch all cloud operations across multiple vendors, helping to catch distributed attacks. They act as a support mechanism to the Security Operations Centre, helping to catch events before they escalate to large-scale issues. Through the use of vendor evaluations, security documentation and the use of cloud-based sensors, the CCCS provides another security mechanism, and helps security practitioners and data scientists show their departments that they are managing the risks associated with the use of public cloud infrastructure.
The CCCS is building a network link, known as MapleTap, which can be placed at the perimeter of each virtual private cloud. MapleTap will be able to monitor all network traffic and provide a line of defence against multiple threats from its place behind the network appliance that provides encryption. This appliance is being built to scale with cloud traffic requirements, with all data being processed in the cloud. Making this part of the perimeter environment allows data scientists to work behind it, confidently passing data between processing nodes while knowing that the perimeter traffic is being monitored for any threats.
Cloud Based Sensors
The Cloud Based Sensor (CBS) agent is designed to unobtrusively support a variety of workloads through the inspection of system and application logs. It supports an ever-growing list of log types, exposing the stream to both the department and the cyber centre. This enables data scientists—along with multiple levels of GC security infrastructure—to monitor the same events at the same time, helping to detect threats and coordinate remediation at a pace that was not traditionally seen with on premise infrastructure. The wide-ranging support for log types allows data scientists to provide a feed of log data to the CBS agent.
A whole of government approach
The CCCS can act as an enabler of cyber security not only for the GC, but for all Canadian organizations. Similarly, the GC Cloud Guardrails are a set of best practices for anyone deploying workloads in public cloud infrastructure. These work hand in hand with the work by Shared Services Canada as part of the GC Cloud Brokering Service to get public cloud vendors certified for PBMM workloads. The set of policies, practices and protections outlined represent the solid foundations on which departments, or any Canadian organization interested in protecting the privacy of Canadians, can build secure and reliable services. Taken together, they allow the deployment of data science workloads focusing on providing services using protected data within a manageable risk level.
With the recent advances in cloud policy, this is an exciting time to be doing data science work in the GC. The opportunities to derive new insights and provide benefits to Canadians are at an all-time high right now. You can start to get your workloads into public cloud by reaching out to your IT partners and finding out how you can best leverage your Cloud Services Framework Agreement. If your department is not ready to leverage cloud services, reach out to the Data Analytics as a Service (DAaaS) team at Statistics Canada to see if the DAaaS platform is right for you.