📚 Documentation & Help

How can we help you?

Everything you need to know about generating realistic synthetic data for your projects

🚀

Quick Start Guide

Generate your first dataset in 4 simple steps

1

Choose Your Method

Upload an existing schema file (JSON, YAML, or CSV) or use the visual Schema Builder to create one from scratch. Try loading an example schema to see how it works.

2

Define Your Data Structure

Add entities (tables) and fields (columns). Choose from 100+ field types including names, emails, addresses, UUIDs, dates, and more.

3

Configure Settings

Set the number of records, choose your output format (JSON, CSV, SQL, or XML), select a locale, and optionally add realistic noise/errors.

4

Generate & Export

Click "Generate Data" and watch the magic happen. Copy to clipboard or download your synthetic dataset instantly.

💡
Pro Tip

Start with an example schema (Users, E-commerce, or Employees) to understand the structure, then modify it for your needs.

🏗️

Using the Schema Builder

Create custom data structures without writing code

📦

Adding Entities

Entities represent tables or collections. Click "+ Add Entity" and give it a name like "users", "products", or "orders". Each entity will generate its own set of records.

📝

Adding Fields

Fields are the columns in your entity. Specify the field name, choose a type from 100+ options, and optionally mark it as nullable or unique.

⚙️

Field Options

Some types have extra options: Enum lets you specify custom values, Number/Integer can have min/max ranges, Reference creates foreign key links.

🔗

Relationships

Connect entities with 1:1, 1:N, or N:N relationships. The generator ensures referential integrity across your dataset.

📤

Uploading Schema Files

Import existing schemas in JSON, YAML, or CSV format

JSON
{
  "entities": [
    {
      "name": "users",
      "fields": [
        { "name": "id", "type": "uuid" },
        { "name": "email", "type": "email" },
        { "name": "name", "type": "fullName" },
        { "name": "age", "type": "integer", "min": 18, "max": 65 },
        { "name": "status", "type": "enum", "values": ["active", "pending", "inactive"] }
      ]
    }
  ]
}
YAML
entities:
  - name: users
    fields:
      - name: id
        type: uuid
      - name: email
        type: email
      - name: name
        type: fullName
      - name: created_at
        type: datetime
📋

Field Types Reference

100+ field types organized by category

Category Types Example Output
Basic string number integer boolean date datetime "hello", 42.5, true, "2024-01-15"
Personal firstName lastName fullName email phone username age gender "John", "Doe", "john.doe@email.com"
Address address street city state country zipCode latitude longitude "123 Main St", "New York", "10001"
Business company jobTitle department product price creditCard iban "Acme Corp", "Engineer", "$99.99"
Internet url domain ip ipv6 mac userAgent "https://example.com", "192.168.1.1"
Identifiers uuid id mongoId nanoid slug "550e8400-e29b-41d4-a716..."
Text word words sentence paragraph lorem "Lorem ipsum dolor sit amet..."
Custom enum weightedEnum regex reference Values from your custom list
🎛️

Error & Noise Settings

Add realistic imperfections for testing

Null Values Rate

Randomly replaces values with NULL to test handling of missing data. Recommended: 5-15%

Missing Data Validation

Typo Rate

Introduces character swaps, deletions, and substitutions in text. Great for testing fuzzy matching.

Text Quality Spell Check

Format Error Rate

Creates malformed emails, dates, and formatted fields. Perfect for testing input validation.

Validation Edge Cases

Duplicate Rate

Adds duplicate records to simulate real-world data entry errors. Test your deduplication logic.

Data Quality Deduplication

Outlier Rate

Generates extreme numeric values for testing edge cases and anomaly detection algorithms.

Analytics Edge Cases
⚠️
Recommendation

High error rates (>20%) may produce data that's too noisy for most testing. Start with 5-10% for realistic scenarios.

Performance & Capacity

Built for speed and scale

1M+
Records in seconds
100+
Field types
270K
Records per second
100%
Client-side processing
Schema Complexity Records Time Output Size
Simple (5 fields) 1,000,000 ~4 seconds ~90 MB
Medium (30 fields) 500,000 ~15 seconds ~500 MB
Complex (80+ fields) 250,000 ~24 seconds ~730 MB
Reproducible Data

Use the seed feature in Settings to generate identical data every time. Perfect for consistent test fixtures and sharing datasets.

Frequently Asked Questions

Quick answers to common questions

No! All data generation happens entirely in your browser using JavaScript. Your schemas and generated data never leave your computer. The app works completely offline once loaded.

Yes! You can manually recreate your schema using the Schema Builder, or export your database schema to JSON/YAML and upload it. The field types map closely to common database column types.

Use the Relationships tab to define connections. For example: create a "user_id" field in orders, then add a relationship from orders.user_id to users.id. The generator ensures referential integrity automatically.

Yes! Change the Locale setting to generate names, addresses, and locale-specific data in German, French, Spanish, Italian, Japanese, or Chinese with appropriate formats and characters.

Use the "Enum" field type and enter comma-separated values like "active,pending,cancelled". For weighted distribution, use "Weighted Enum" with weights: "active:70,pending:20,cancelled:10".

This can happen with large datasets for fields with limited values (like first names). Use UUID or ID types for truly unique fields, or reduce the record count.

Yes! The Synthetic Data Generator is completely free for personal and commercial projects. No usage limits, no sign-up required, and no watermarks on generated data.

For large datasets (100K+), try: reducing records, simplifying your schema, using Chrome or Firefox, closing other tabs, or generating in smaller batches.

Ready to generate data?

Start creating realistic synthetic datasets in seconds

Open Generator →