You need access to their data to process it, any layer of indirection (like a database they control) is additional complexity without meaningful benefit. For clients with strict data control requirements, self-hosting of the whole system is the standard solution (with a very high licensing fee).
Something to keep in mind is that some clients are not operating in good faith, their goal isn't to work together to find a solution but to present roadblocks. The reasoning can be complicated, perhaps there's internal politics around which solution to use, perhaps your solution is receiving pushback because it's not the preferred solution of one stakeholder. You'll probably never know the true motivations, it's important not to get caught up in engineering a solution to a problem that doesn't really exist.
You've mentioned that the data you need access to is code: GitHub is a perfect comparable. GitHub's cloud service is used by the majority of companies with code, in fact, I'd guess even your clients are using GitHub's hosted services. If the problem is that your company doesn't have the reputation necessary to give these clients confidence that you can securely manage their code, that may just be a sign that right now, these clients aren't the right fit for you, and you should work with less antsy clients until you have built up the credibility.
ukoki 16 days ago [-]
> their goal isn't to work together to find a solution but to present roadblocks. The reasoning can be complicated..
Or as simple as “the less I appear to value this solution, the lower the supplier will estimate my maximum price for it”
alliewithane 15 days ago [-]
That is very valid. My problem is that a large portion of my possible clients seemed to be happy with the idea of the solution I provided. I was looking in it tech wise because I somewhat validated it for my current client-space.
Self-hosting seems like the most reliable option for the time being (or executing functions on the encrypted data without decrypting it) however, is it standard practice that I use Kubernetes to give them a preconfigured database that they can deploy on my own cloud? I wouldn't access the code except temporarily through a little script that talks to my cloud that comes along the database in the pod that they "self host." Would that be considered standard practice?
aimazon 13 days ago [-]
No, that wouldn't be considered standard practice. Fundamentally, if you are able to control the code that executes then you can exfiltrate the data regardless of how it is stored. The reason self-hosting is a secure way to execute code against data is because it removes the code from your control: with self-hosting, you would give your code over to the client and then they would run it in their environment.
Providing your customers with their own database in your environment is a method for segregating their data and ensuring that there's no unintentional co-mingling of their data with other customers (which is a common problem in a multi-tenant environment) but it does not protect the customer data from being accessed by you: if code you are executing can access the data, then you can access the data.
Reading between the lines ("a large portion of my possible clients seemed to be happy with the idea of the solution I provided") it sounds like my initial understanding of the situation was incorrect: I thought that you had been asked to build this specific architecture by your clients but it sounds like it's the opposite: you've had an idea, come up with an architecture and then validated that idea with potential clients by describing the architecture? Is that correct?
If that's the actual situation, I think this is a much simpler problem to solve. Architecture is architecture, it isn't a part of the solution, it's a means to an end. There are a very small number of clients who may have strict security/compliance requirements that do necessitate this sort of complexity (which is where self-hosting comes in) but for the majority of clients, how the product works is immaterial, they care only about the results.
Realising that you've made a terrible mistake when building a system using the architecture you designed 6 months ago is a rite of passage, it is the process: every vision you have today for how your system will work is probably going to be wrong 6 months from now. That's completely normal, you will learn more about how your system should work in 1 month of building than you would in 6 months of planning.
Try to take a step back from thinking about architecture. One of the biggest dangers when working on an early stage technology product is committing yourself to a technical direction that then dictates the product direction. If, for example, you decide today to build a system that in which clients self-host the database that your code accesses, and then you decide you want to build a feature that requires 10x as many queries to the database, oops, you can't build that, because it would require your clients upgrade their self-hosted database resources, and getting them to do that will be all but impossible.
If you want to share more about your idea, I can outline some ideas about how I might approach building it in a cheap way that allows for validating the idea. There are exceptions but nowadays, given the maturity of the software development space, most ideas can be built and launched to validate with real customers in 1 month. If your vision for how you'll build something requires, 3, 6 or 12 months to get customers using it, it's probably over complicated.
curious_curios 19 days ago [-]
Two options I’ve seen:
Customer Managed Keys - You have everything encrypted in your database via a key the customer has. You request (likely automated) that key every time you process the data. They can revoke at any point, and have an audit log of every access.
Self Hosting - Let the customer host your solution themselves or automate spinning up a cloud environment for them that they have full control over.
Both are kind of a pain to implement, but that lets you charge more for these enterprise features.
alliewithane 18 days ago [-]
I see, I heard about "fully homomorphic encryption" which is faster to implement and allows you to run code on encrypted data but the time complexity is O((10^6) * n) which is insane.
bobbiechen 16 days ago [-]
Confidential Computing also provides data-in-use protection and has a significantly more realistic overhead, often <10% in real-world workloads I've seen. However, in this case you might want to combine it with customer managed keys (BYOK) or self-hosting anyways - otherwise the customer has no opportunity to perform remote attestation and prove you're really running in Confidential Computing.
The visualization about halfway down https://www.anjuna.io/solution/secure-ai (my employer) is an example of the self-hosted flavor of this. Happy to discuss deeper, my contact info is in my bio.
16 days ago [-]
roetlich 12 days ago [-]
> O((10^6) * n)
Isn't that O(n)? Is there a typo or am I missing something?
abrookewood 16 days ago [-]
So we recently had to do something like this for PCI DSS certification. The database is encrypted at rest (AWS RDS), but the data is presented as clear text to any DBA. The solution we came up with was to add field-level encryption to certain Card Holder Data (CHD) fields like Account etc. To do this, we use AWS KMS to encrypt/decrypt the data and then we only grant the rights to use this key to to an IAM Role that they database holds and explicitly prevent any Admin accounts from accessing it. End result is that Admins can manage the database, but can't see all of it in the clear.
cocoa19 16 days ago [-]
Why do they hate the idea?
It’s not clear what the core problem is. Are they contractually or by law obligated to comply with security/privacy requirements? Are they afraid you’ll misuse their data (steal their business, etc).
If you can be explicit about what “hate” means, you can find a solution, or decide this is not a potential customer.
alliewithane 15 days ago [-]
They are not comfortable with the fact that I can look at their code base whenever I want.
chiph 16 days ago [-]
Are you using one database per customer or a shared database (with an additional key on the tables)?
Because for enterprise clients they're going to want their own database. Which has it's own licensing and operating costs - that you should be building into your price. And since they will have their own database it can be encrypted with a key that is unique to them.
For small business customers, a shared database is the only way to stay profitable.
VTimofeenko 16 days ago [-]
Disclaimer: I work for Snowflake.
This idea (customer owns the data, code is deployed next to the data, data never leaves customer perimeter) is the exact use case for the native application framework:
Confidential Computing is a way in which cloud providers let their customers encrypt data “in-use” - that might be what you’re looking for.
alliewithane 14 days ago [-]
Sounds like it's exactly what I need. Thank you!
tonygiorgio 16 days ago [-]
Yeah exactly this. Especially if you need to programmatically process that data too. You can even let the customers provide their own managed key too (such as AWS externally managed KMS) in combination with something like AWS nitro enclaves.
I’ve enjoyed building on nitro myself and most things should run in it just fine, just need to build the networking vsock proxy into the nitro image for anything that needs networking (such as DB, where you store the encrypted at rest data).
rozenmd 18 days ago [-]
Do they hate that it's unencrypted in the DB, or that the DB's storage itself is unencrypted?
(for my business, anyway) I've found this wording to be enough for bigger customers:
My main problem is that I need to do operations on the data while it's in the DB. This means that I cannot leave it encrypted end-to-end there.
atmosx 18 days ago [-]
When RDS is encrypted at rest, it means that the data stored in the database is encrypted while it resides on disk. Means that the data is protected against unauthorised access to raw storage.
The data accessed by the app is not encrypted, you can still work on the data as you would usually do. It's mostly a compliance thing. Not sure what level of security it _actually_ brings to the data itself, but most companies are okay with "encryption at rest".
UltraSane 16 days ago [-]
Encryption at rest is meant to protect data when the storage device is stolen or lost.
cr125rider 16 days ago [-]
Sure you can. You just can’t do zero knowledge encryption.
alliewithane 15 days ago [-]
How is that possible?
oceanparkway 16 days ago [-]
I would ask them what their ideal setup is and then compare feasibility. There's probably a lot of indirections/hoops you could jump through but if your security concerns are being driven by your customers you should probably ask them. If it is the case that you need to access their unencrypted data then at one point or another you're going to have to do it, the question is which possible way would your customers feel happiest about? On-premises contract, storing encrypted + customer-specific decrypt keys with a managed auth service, etc etc
williamtrask 16 days ago [-]
I lead an open source nonprofit which deploys things like this. Feel free to shoot me a DM on Twitter. Handle is @iamtrask
ezekg 18 days ago [-]
Sounds overly complicated. Use at-work encryption (i.e. encrypt it in the database), on top of encryption in-transit and at-rest, hosted/managed by a reputable database vendor. If that won't fly, then I agree with the (enterprise) self-hosted offering another commenter mentioned.
alliewithane 18 days ago [-]
The problem is that I cannot do that. I need to run code on the data which means I can access the data theoretically any time and thus my client is super uncomfortable with that considering I need to access their code base.
ezekg 18 days ago [-]
Are they uncomfortable with you accessing their data, or are they uncomfortable with you storing their data unencrypted, risking their IP in case of a breach? Two different things.
The former means they aren't a fit for SaaS (i.e. offer self-hosting), and the latter means you can use at-work encryption, only decrypting the data to process it.
Without more info on what you're actually building, I can't really be of more help here.
alliewithane 15 days ago [-]
It's the latter. It's pretty much an agent for their GitHub repo. The agent needs access to their code and keeps some kind of knowledge that it generated in a tree database. Wouldn't it be considered a red flag that I can access their data whenever I want? If I used at-work encryption that just means that I have the ability to access to data whenever I want. However if they did some sort of self-hosting then I can only access it temporarily via APIs and thus I can only access the data temporarily when &they want.
austin-cheney 18 days ago [-]
Why not run the database in a docker container, one for each client? They could even run on the same machine.
alliewithane 18 days ago [-]
That makes sense, I could add some code in the container that can communicate with private APIs in my servers. Is this standard practice or just an adhoc solution?
Something to keep in mind is that some clients are not operating in good faith, their goal isn't to work together to find a solution but to present roadblocks. The reasoning can be complicated, perhaps there's internal politics around which solution to use, perhaps your solution is receiving pushback because it's not the preferred solution of one stakeholder. You'll probably never know the true motivations, it's important not to get caught up in engineering a solution to a problem that doesn't really exist.
You've mentioned that the data you need access to is code: GitHub is a perfect comparable. GitHub's cloud service is used by the majority of companies with code, in fact, I'd guess even your clients are using GitHub's hosted services. If the problem is that your company doesn't have the reputation necessary to give these clients confidence that you can securely manage their code, that may just be a sign that right now, these clients aren't the right fit for you, and you should work with less antsy clients until you have built up the credibility.
Or as simple as “the less I appear to value this solution, the lower the supplier will estimate my maximum price for it”
Self-hosting seems like the most reliable option for the time being (or executing functions on the encrypted data without decrypting it) however, is it standard practice that I use Kubernetes to give them a preconfigured database that they can deploy on my own cloud? I wouldn't access the code except temporarily through a little script that talks to my cloud that comes along the database in the pod that they "self host." Would that be considered standard practice?
Providing your customers with their own database in your environment is a method for segregating their data and ensuring that there's no unintentional co-mingling of their data with other customers (which is a common problem in a multi-tenant environment) but it does not protect the customer data from being accessed by you: if code you are executing can access the data, then you can access the data.
Reading between the lines ("a large portion of my possible clients seemed to be happy with the idea of the solution I provided") it sounds like my initial understanding of the situation was incorrect: I thought that you had been asked to build this specific architecture by your clients but it sounds like it's the opposite: you've had an idea, come up with an architecture and then validated that idea with potential clients by describing the architecture? Is that correct?
If that's the actual situation, I think this is a much simpler problem to solve. Architecture is architecture, it isn't a part of the solution, it's a means to an end. There are a very small number of clients who may have strict security/compliance requirements that do necessitate this sort of complexity (which is where self-hosting comes in) but for the majority of clients, how the product works is immaterial, they care only about the results.
Realising that you've made a terrible mistake when building a system using the architecture you designed 6 months ago is a rite of passage, it is the process: every vision you have today for how your system will work is probably going to be wrong 6 months from now. That's completely normal, you will learn more about how your system should work in 1 month of building than you would in 6 months of planning.
Try to take a step back from thinking about architecture. One of the biggest dangers when working on an early stage technology product is committing yourself to a technical direction that then dictates the product direction. If, for example, you decide today to build a system that in which clients self-host the database that your code accesses, and then you decide you want to build a feature that requires 10x as many queries to the database, oops, you can't build that, because it would require your clients upgrade their self-hosted database resources, and getting them to do that will be all but impossible.
If you want to share more about your idea, I can outline some ideas about how I might approach building it in a cheap way that allows for validating the idea. There are exceptions but nowadays, given the maturity of the software development space, most ideas can be built and launched to validate with real customers in 1 month. If your vision for how you'll build something requires, 3, 6 or 12 months to get customers using it, it's probably over complicated.
Customer Managed Keys - You have everything encrypted in your database via a key the customer has. You request (likely automated) that key every time you process the data. They can revoke at any point, and have an audit log of every access.
Self Hosting - Let the customer host your solution themselves or automate spinning up a cloud environment for them that they have full control over.
Both are kind of a pain to implement, but that lets you charge more for these enterprise features.
The visualization about halfway down https://www.anjuna.io/solution/secure-ai (my employer) is an example of the self-hosted flavor of this. Happy to discuss deeper, my contact info is in my bio.
Isn't that O(n)? Is there a typo or am I missing something?
It’s not clear what the core problem is. Are they contractually or by law obligated to comply with security/privacy requirements? Are they afraid you’ll misuse their data (steal their business, etc).
If you can be explicit about what “hate” means, you can find a solution, or decide this is not a potential customer.
Because for enterprise clients they're going to want their own database. Which has it's own licensing and operating costs - that you should be building into your price. And since they will have their own database it can be encrypted with a key that is unique to them.
For small business customers, a shared database is the only way to stay profitable.
This idea (customer owns the data, code is deployed next to the data, data never leaves customer perimeter) is the exact use case for the native application framework:
https://docs.snowflake.com/en/developer-guide/native-apps/na...
I’ve enjoyed building on nitro myself and most things should run in it just fine, just need to build the networking vsock proxy into the nitro image for anything that needs networking (such as DB, where you store the encrypted at rest data).
(for my business, anyway) I've found this wording to be enough for bigger customers:
Data is stored on AWS RDS, encrypted at rest by an industry standard AES-256 encryption algorithm (more on that here: https://aws.amazon.com/rds/features/security/)
The data accessed by the app is not encrypted, you can still work on the data as you would usually do. It's mostly a compliance thing. Not sure what level of security it _actually_ brings to the data itself, but most companies are okay with "encryption at rest".
The former means they aren't a fit for SaaS (i.e. offer self-hosting), and the latter means you can use at-work encryption, only decrypting the data to process it.
Without more info on what you're actually building, I can't really be of more help here.