FileGPT.dev is designed as a private document intelligence platform. This page explains the current architecture, security controls, and practical limitations in transparent technical terms.
1. System architecture (high level)
- Authenticated user uploads PDF, DOCX, or TXT files.
- Files are stored in private object storage paths scoped per user.
- Files are parsed, chunked, and embedded for semantic retrieval.
- User queries retrieve top relevant chunks from indexed content.
- Query + retrieved chunks are sent to model inference to generate a response.
- Response and citations are returned in chat and may be stored in session history.
2. Security principles
- Least privilege: authenticated endpoints and user-scoped access checks are used across document and chat operations.
- Data minimization: query flows are designed to send relevant excerpts rather than complete document archives in normal operation.
- Isolation: user data is logically separated by account-scoped identifiers and policy controls.
- Defense in depth: request validation, rate limits, and signed access URLs are used to reduce abuse risk.
3. Data handling details
- Uploaded raw files are stored to provide document viewing, reprocessing, and deletion control.
- Extracted text, chunks, and embeddings are stored for retrieval performance.
- Chat sessions and messages are stored to support conversation history features.
- Usage metadata is stored for operational monitoring and service management.
- Customer-initiated deletion removes source documents and associated indexed data.
4. Encryption and transport
- Client-to-service and service-to-provider traffic is transmitted over TLS.
- Storage and database encryption at rest is provided by managed infrastructure providers.
- Short-lived signed URLs are used for controlled document preview access.
5. AI model usage and scope
The platform uses external model APIs for embeddings and answer generation. Inputs may include user prompts, relevant excerpts, and limited chat context. Model provider processing may occur outside your primary geographic region depending on provider infrastructure.
We do not use Customer Content to train our own proprietary models. Third-party provider handling is governed by provider terms and contractual controls available to us.
6. Monitoring and abuse prevention
- Authentication checks on protected routes and server actions.
- Schema validation on ingestion and query requests.
- Per-user rate limiting on upload, ingest, query, and deletion endpoints.
- Abuse-protected server-side contact intake for walkthrough requests.
- Operational logging for reliability and incident analysis.
7. Limitations and residual risk
- Generative outputs can still be incorrect even when citations are present.
- Third-party model providers are part of the processing chain for inference/embedding.
- No security system guarantees elimination of all risk or zero incident probability.
- Configuration outside this codebase (cloud provider settings, enterprise network controls) may materially affect your risk profile.
8. Related documentation
See the Privacy Policy for personal data handling and the DPA for processor terms.
9. Security contact
Security and compliance questions can be sent to info@filegpt.dev.