Collection

Complex Scenario: Audit reports rely on Evidences and Projects
The audit report is the final deliverable that auditors produce, but it's actually the tip of an iceberg built on mountains of underlying data. In large enterprises, a single audit report might need to synthesize findings across multiple business units, geographies, and time periods.
The complexity arises because audit reports require relational integrity—every finding must trace back to verifiable evidence, and every evidence item must link to specific projects or business activities. Without a structured system, auditors spend 60-70% of their time manually hunting for data and cross-referencing spreadsheets rather than actually analyzing risk.
For a telco enterprise, one annual audit might touch 50+ internal systems, thousands of vendor contracts, and regulatory requirements spanning multiple jurisdictions. The audit report becomes a synthesis layer that needs to pull from two distinct but interconnected data domains: Evidence => and Projects
Evidence represents the raw data layer, the foundational documentation that proves or disproves compliance, validates transactions, and supports audit conclusions.
Invoice documentation spans procurement invoices, payment records, tax documents, and vendor receipts. A single project might generate hundreds of invoices from different suppliers, each requiring verification against contracts, delivery confirmations, and budget approvals. These arrive in PDFs, scanned images, EDI formats, and email attachments with no standardized structure.
Product data in Excel is perhaps the most problematic evidence type. Different departments maintain their own spreadsheets with inconsistent naming conventions, version control nightmares, and formula errors that propagate across files. Inventory records, pricing sheets, and technical specifications often exist in dozens of Excel files that nobody can reconcile.
Client feedback comes through surveys, support tickets, email threads, call center transcripts, and social media mentions. This unstructured data needs to be categorized, sentiment-analyzed, and linked to specific products, services, or contracts to become meaningful audit evidence.
The challenge: auditors currently manage these evidence collections through folder structures, email chains, and institutional knowledge. When an auditor leaves, critical context disappears.
In large enterprises like telcos, projects create cascading complexity.
Marketing Outsourcing from Company A exemplifies vendor risk. This single project might involve contractual obligations worth millions, performance SLAs, data sharing agreements under privacy regulations, intellectual property transfers, and revenue-sharing arrangements. Auditing this requires evidence from legal (contracts), finance (payments and ROI tracking), marketing (campaign deliverables), and IT (data access logs). Each subsidiary company running similar outsourcing arrangements multiplies this complexity.
Importing new Hardware contract introduces supply chain, customs compliance, quality assurance, and capital expenditure risks. A telco deploying new network infrastructure might have hardware contracts spanning multiple countries, each with different import duties, local partnership requirements, and technical certification needs. Evidence needed includes customs declarations, quality inspection reports, installation certifications, warranty documentation, and performance benchmarks.
When a telco operates through five subsidiary companies, each running twenty major projects annually, you're looking at 100+ project contexts that all need evidence linkage and audit coverage. The million-dollar impact isn't just the project value—it's the regulatory penalties, reputational damage, and operational failures that occur when audit can't trace problems to root causes.
Key concepts
Within each Project, Collections act as containers for related data. If you're familiar with databases, a Collection works like a Table in SQL. If you think in terms of file systems, it's similar to a Folder in your operating system.
Every file or record you upload into a Collection becomes an Asset—a single unit of data ready for processing, analysis, and knowledge extraction.
Collection Modes
Each Collection operates in one of two modes, depending on how you want to use the data:
Table
Structured data storage with custom columns
Not supported
Agent
Structured storage + builds shareable knowledge
Supported
Table Mode
Data is stored directly in a SQL database. You define as many columns as needed to capture the specific information you care about. Best for structured records where you need fast queries but don't require AI-driven knowledge sharing.
Agent Mode
Includes everything Table mode offers, plus one critical addition: the Collection builds a private knowledge graph. This knowledge can be shared with other Collections, allowing AI agents to connect insights across your organization.
Use Agent mode when your data contains domain expertise that other teams or processes should access.
Columns
Columns define what information you want to capture from each Asset in your Collection. When raw data enters DATALOG, columns determine how that data gets structured and what insights get extracted.
There are 7 column types divided into two categories:
Primitive Types (Extracted from Raw Data)
These columns capture information directly from your uploaded files and records.
Date
Timestamps, deadlines, effective dates
Invoice date, contract expiry
Number
Quantities, amounts, measurements
Total amount, quantity ordered
Text
Names, descriptions, categories
Vendor name, product description
Table
Nested tabular data within a document
Line items in an invoice
JSON
Structured data objects
API responses, configuration data
Advanced Types (System-Enhanced)
These columns go beyond extraction—they add intelligence and automation to your data.
Static
Fixed values set manually by users or via API
Status flags, manual classifications, department codes
Agent
Delegates another AI Agent to review each Asset and populate the final result to the column.
Risk assessment, sentiment analysis, compliance check
Agent columns are powerful. Instead of manually reviewing thousands of documents, you assign a specialized Agent to analyze each Asset and fill in the column automatically. For example, an Agent could read every contract and populate a "Risk Level" column with High, Medium, or Low based on your custom criteria.
Agent Knowledge (RAG)
Assume your organization has a very high frequency update based on Product and Merchant policy.

Last updated