Data Collection Tools Tutorial

Tutorial on how to collect, process and analyze AI standard data tools

Overview

The Data Collection Tools section provides three powerful tools to help you gather and process AI standards data from multiple sources. Each tool has a specific purpose and works together in a workflow.

Three Main Cards

1One-Click Data Collection

Purpose: Automatically collect AI standards data from three sources:

Button: Start Data Collection - Click to begin collecting data from all three sources

Result: Data is saved to your personal database with user isolation

2Data Processing

Purpose: Process and standardize the collected data through 8 intelligent steps:

Button: Start Data Processing - Click to process your collected data

Button: Download Processed Data - Download the processed CSV file after completion

3Data Analysis

Purpose: Generate visualizations and discover terminology

Buttons:

Advanced Options

Click the "Advanced Options" dropdown to configure collection and processing parameters:

Data Source Options

Collection Parameters

Processing Parameters

AI Model Configuration (DeepSeek R1-14b)

How to Use Your Own DeepSeek (⚠️ This requires your local computer GPU memory ≥ 16G!):

  1. Check the checkbox: "Use my own DeepSeek R1-14b deployment"
  2. Input field will appear: Enter your DeepSeek URL (format: http://your-ip:11434)
  3. Click "Test Connection": Verify your DeepSeek service is accessible
  4. Wait for result: Green ✅ = Success, Red ❌ = Failed
  5. Save automatically: Configuration saves when you leave the input field
Finding Your DeepSeek API Endpoint

The DeepSeek API endpoint is your computer's IP address + port 11434:

  1. Open PowerShell or Command Prompt
  2. Run: ipconfig
  3. Find IPv4 Address (e.g., 192.168.1.100)
  4. Your endpoint: http://192.168.1.100:11434
How to Deploy and Configure Your Own DeepSeek

Step 1: Install Ollama

Step 2: Download DeepSeek R1-14b Model

Step 3: Configure Ollama to Accept External Connections

⚠️ Important: Temporary vs Permanent Configuration

Temporary (current terminal session only):

  • Windows: set OLLAMA_HOST=0.0.0.0:11434
  • Mac/Linux: export OLLAMA_HOST=0.0.0.0:11434
  • ⚠️ This setting is lost when you close the terminal

Permanent (recommended):

  • Windows: Add to System Environment Variables
    • Search: Edit System Environment Variables
    • Environment Variables → System variables → New
    • Variable name: OLLAMA_HOST
    • Variable value: 0.0.0.0:11434
    • Restart your computer or terminal
  • Mac/Linux: Add to shell profile
    • Edit ~/.bashrc or ~/.zshrc
    • Add line: export OLLAMA_HOST=0.0.0.0:11434
    • Run: source ~/.bashrc (or restart terminal)

Step 4: Start Ollama Service

Step 5: Find Your DeepSeek API Endpoint

Step 6: Configure Firewall

Step 7: Test Connection (3 Methods)

Step 8: Save Configuration

Workflow & Button Dependencies

1
Start Data Collection
Always available. Click to begin collecting data from three sources. This is the starting point of the workflow.
2
Start Data Processing
Enabled after collection completes. Processes your collected data through 8 steps.
3
Download Processed Data
Enabled after processing completes. Download your processed standards as CSV.
4
Run Visualization
Enabled after processing completes. Generates 14 interactive charts for analysis.
5
Create Glossary
Enabled after processing completes. Builds terminology database from 4 sources.
6
Discover Terms
Enabled after glossary is created. Uses AI to find new terms in your standards.

Quick Start Scenarios

Scenario 1: Full Workflow (Collect Your Own Data)

Steps:

  1. Click Start Data Collection and wait for completion
  2. Click Start Data Processing and wait for completion
  3. Click Run Visualization to see 14 interactive charts
  4. Click Create Glossary to build terminology database
  5. Click Discover Terms to find new terms with AI

Scenario 2: Quick Analysis (Use Official Al Standards Database)

Steps:

  1. Open Advanced Options
  2. Check "Use Official AI Standards Database"
  3. The first two cards become disabled (no need to collect/process)
  4. Click Run Visualization immediately
  5. Click Create Glossary to build your glossary
  6. Click Discover Terms when ready

Scenario 3: Skip Glossary Creation (Use Official AI Glossary Database)

Steps:

  1. Open Advanced Options
  2. Check "Use Official AI Glossary Database"
  3. Complete normal workflow (collect → process → visualize)
  4. Create Glossary button is disabled (no need to create)
  5. Click Discover Terms directly after processing

Tips & Notes

Pro Tip: Use Official AI Standards Database

For quick analysis without waiting, check "Use Official AI Standards Database" in Advanced Options. This skips the 30-40 minutes collection and processing time.

⚠️ Important Notes

Checkbox Logic

Official Standards Database Checkbox

When checked:

Official Glossary Database Checkbox

When checked:

❓ Frequently Asked Questions

Q: Why are some buttons disabled?

Buttons follow a workflow sequence. Each step must complete before the next becomes available. Check if you've completed the previous steps or if you've enabled official database options.

Q: How long does data collection take?

Typically 30-40 minutes depending on the number of pages configured in Advanced Options.

Q: Can I use the tools without collecting data?

Yes! Check "Use Official AI Standards Database" in Advanced Options to skip collection and processing.

Q: What happens if I check both official database checkboxes?

Both collection and processing cards become disabled. You can directly use Run Visualization and Discover Terms with official data.

Q: Can I use my own DeepSeek model for AI classification?

Yes! You can deploy DeepSeek R1-14b on your own computer and configure the system to use it:

  1. Install Ollama on your computer (https://ollama.com/)
  2. Pull the model: ollama pull deepseek-r1:14b
  3. Configure Ollama to accept external connections: set OLLAMA_HOST=0.0.0.0:11434
  4. Start Ollama: ollama serve
  5. Open firewall port 11434
  6. In Advanced Options, check "Use my own DeepSeek R1-14b deployment"
  7. Enter your DeepSeek URL (e.g., http://192.168.1.100:11434)
  8. Click "Test Connection" to verify

Q: Why can't the server connect to my DeepSeek service?

Common reasons and solutions:

Q: What if my DeepSeek service becomes unavailable during processing?

The processing will fail with an error message. You can:

Q: Is my DeepSeek configuration private?

Yes. Each user's DeepSeek configuration is stored privately in the database. Other users cannot see or use your configuration.

← Back to AI Standards