The original post: /r/selfhosted by /u/thirimash on 2024-09-30 08:44:35.

Hi all,

I am an IT administrator for a company that develops its own software. We have a fairly extensive database of technical documentation and manuals that our developers use on a regular basis. Recently, I’ve noticed that some of the team has started using tools like ChatGPT to support their work. While I realize the value that such tools can bring, I’m starting to worry about security issues, especially the possibility of unknowingly sharing company data with outside parties.

My question is: have any of you had to deal with a similar challenge? How have you resolved data protection issues when using language-based models (LLMs) such as ChatGPT? Or do you have experience with implementing self-hosted LLMs that could handle several users simultaneously (in our case, we’re talking about 4-5 simultaneous sessions)? The development team is about 50 people, but I don’t foresee everyone using the tool at the same time.

I am interested in the question of a web interface with login and access via HTTPS. I’m also thinking about exposing an API, although that may be more complex and require additional work to build a web application.

Additionally, I’m wondering how best to approach limiting the use of third-party models in developers’ day-to-day work without restricting their access to valuable tools. Do you have any recommendations for security policies or configurations that could help in such a case?

Any suggestion or experience on this topic would be very helpful!

Thanks for any advice!