Uploaded Data

Understanding Feasibility vs. Complexity

Feasibility Analysis checks whether it is theoretically possible to satisfy all constraints simultaneously.

Complexity Analysis estimates computational difficulty of finding valid solutions.

Constraint Feasibility Analysis

Problem Complexity Analysis

Interpreting Complexity Metrics

Number of Stimuli: Size of your dataset.
Binary Variables: Number of optimization variables.
Constraint Equations: Total mathematical constraints generated.
Constraint Density: Ratio of equations to variables.
Problem Scale: Overall size metric (logarithmic).
Estimated Difficulty: Overall assessment with recommendations.

Generation Summary

Understanding Diversity Metrics

Diversity metrics compare variability between generated sequences to random baseline.

Kendall Tau Distance: Measures dissimilarity (0-1).
Random Baseline: Expected diversity from random orderings.
Diversity Retention: Percentage preserved. >100% possible with optimization.

Diversity Metrics

Pairwise Distance Heatmap

Select Sequence

Output Settings

Output Format

Include Header

Include Position Column

Download

Download All Sequences (ZIP)
Download Summary

File Naming

Files are named based on your uploaded data file:

Sequences: [datafile]_seq001.tsv, [datafile]_seq002.tsv, ...
Archive: [datafile]_sequences_[date].zip
Summary: [datafile]_summary_[date].tsv

Shiny Application User Guide

Overview

MILP Randomizer provides an interactive web interface for creating optimized stimulus orderings with custom constraints. This guide explains each feature and how to use the application effectively.

Getting Started

1. Data Input

Upload Your Data File

Supported formats: CSV, TSV, Excel (.xlsx, .xls)
Maximum size: Configurable (default 1000 rows, 120 in restricted mode)
Requirements:
- First row should contain column headers
- Each row represents one stimulus/trial
- Include all relevant attributes (stimulus type, category, ID, etc.)

Example Data Structure:

stimulus_id,category,valence,frequency
stim_001,noun,positive,high
stim_002,verb,neutral,medium
stim_003,noun,negative,low
...

Data Preview Tab

View uploaded data summary
Check number of rows and columns
Verify column names match your expectations
Preview first 100 rows in interactive table

2. Define Constraints

Constraints control how stimuli can be ordered. The application supports two types:

Maximum Consecutive Repetitions

Prevents the same value from appearing too many times in a row.

Example: “No more than 2 consecutive trials of the same category”

Type: Max Consecutive
Column: category
Max Repetitions: 2

Result: Valid: [A, A, B, C, C, A] | Invalid: [A, A, A, B, C]

Minimum Distance Between Repetitions

Ensures spacing between occurrences of the same value.

Example: “At least 3 positions between same stimulus”

Type: Min Distance
Column: stimulus_id
Min Distance: 3

Result: Valid: [A, B, C, D, A] | Invalid: [A, B, C, A]

Managing Constraints

Add Constraint

Click “Add Constraint” button
Select constraint type
Choose column from your data
Set parameter value (max repetitions or min distance)

Remove Constraint

Click the × button on any constraint card

Import Constraints

Click “Import” button
Select a TSV/CSV file with constraints
File must have columns: type, column, max_rep (for max_consecutive), min_dist (for min_distance)
Important: Column names must match your dataset exactly
If columns don’t match, you’ll see an error listing available columns

Export Constraints

Click “Export Constraints” button
Saves current constraints to TSV file
Filename: constraints_[datafile]_YYYY-MM-DD.tsv
Use this file to reuse constraints with similar datasets

Constraint Numbering

Constraints are numbered sequentially for display
Numbers update automatically when constraints are removed
Import replaces all existing constraints (with confirmation)

3. Generation Settings

Number of Sequences

How many different orderings to generate
Range: 1 to configurable maximum (default 100, 20 in restricted mode)
More sequences = better diversity analysis but longer computation

Strategy

Heuristic First (Default, recommended for most cases)

Tries fast heuristic search first
Falls back to MILP if heuristic fails
Usually completes in seconds to minutes
May not always find optimal solution

MILP First (Guaranteed optimal solutions)

Uses mathematical optimization (HiGHS solver)
Falls back to heuristic if MILP times out
May take minutes to hours for complex problems
Provides proven optimal solutions when successful

Note: In restricted mode, only heuristic method is available.

Heuristic Max Attempts

How many random orderings to try
Higher = better chance of success but slower
Recommended: 5,000 for simple problems, 10,000+ for complex
Maximum: Configurable (default 50,000, 5,000 in restricted mode)

MILP Settings (if enabled)

MILP Timeout

Maximum time allowed for MILP solver (seconds)
Increase for complex problems
Recommended: 120s default, 300-600s for difficult problems

MILP Solver

HiGHS (recommended): Fast, modern solver
Rglpk: Alternative solver, sometimes better for specific problems
ROI: Generic interface, uses available solvers

Additional Options

Verify Constraints

Checks generated sequences satisfy all constraints
Recommended: Keep enabled
Adds minimal computation time

Compute Diversity

Calculates variability between sequences
Compares to random baseline
Always computed when checked, even without constraints
Useful for understanding constraint impact on diversity

Show Detailed Progress

Displays verbose console output
Shows attempt numbers, solver status, timing
Helpful for troubleshooting

Random Seed

Optional number for reproducible results
Same seed + same data/constraints = identical sequences
Leave empty for random generation

4. Analysis Tabs

Feasibility Check

Purpose: Verify constraints before generation

Feasibility Analysis

Checks if constraints can be satisfied simultaneously
Identifies logical conflicts
Example conflict: “Max 1 consecutive of A” but only 3 total items

Complexity Analysis

Estimates computational difficulty
Helps choose appropriate strategy

Metrics Explained:

Stimuli: Dataset size
Binary Variables: Optimization problem size (stimuli × positions)
Constraint Equations: Total mathematical constraints generated
Constraint Density: Equations/variables ratio (>1.0 = very constrained)
Problem Scale: Logarithmic complexity measure
- <3: Low complexity
- 3-5: Moderate complexity
- 5-7: High complexity
- 7: Very high complexity
Difficulty: Overall assessment with method recommendation

What’s the Difference?

Feasibility: Can constraints be satisfied? (yes/no question)
Complexity: How hard to find solutions? (computational question)

A problem can be feasible but very complex, or infeasible but simple.

Generation Results

Generation Summary

Total sequences attempted
Successful vs. failed
Methods used (heuristic/MILP)
Average generation time

Console Output

Detailed log of generation process
Visible when “Show Detailed Progress” enabled
Shows per-sequence status

Error Messages

Displayed if generation fails
Includes suggestions for resolution:
- Check feasibility
- Relax constraints
- Change strategy
- Increase timeouts

Summary Table

One row per sequence
Columns: ID, status, method, size, time, constraint satisfaction
Sortable and searchable

Diversity Analysis

Purpose: Understand variability between generated sequences

What is Diversity?

Measures how different sequences are from each other
Uses Kendall Tau distance (correlation-based metric)
Range: 0 (identical) to 1 (maximally different)

Metrics Displayed:

Kendall Tau Distance

Mean: Average dissimilarity across all sequence pairs
SD: Standard deviation of distances
Higher mean = more diverse sequences

Random Baseline

Expected diversity if sequences were fully random
Provides context for interpreting observed diversity
Computed from 100 random orderings

Diversity Retention

Percentage: (Observed / Baseline) × 100%
Shows how much variability survived constraint application

Interpreting Retention:

<30%: Very low diversity (highly restrictive constraints)
30-50%: Low diversity (moderate restrictions)
50-80%: Moderate diversity (balanced constraints)
80-100%: High diversity (flexible constraints)
100%: Normal for unconstrained or lightly constrained sequences

Why >100%?

Optimization algorithms can find more diverse orderings than pure randomization
Common for unconstrained sequences or when constraints allow flexibility
Indicates the algorithm explored solution space effectively

Pairwise Distance Heatmap

Visual representation of all sequence pairs
Blue = similar, Red = different
Diagonal is always 0 (sequence compared to itself)

Note: Diversity requires at least 2 successful sequences and “Compute Diversity” enabled.

Sequence Preview

Select any generated sequence from dropdown
View full ordering in table format
Shows all columns (or selected output columns)
Includes position column if enabled
Useful for manual inspection

5. Export Results

Output Settings

Output Format

TSV: Tab-separated (recommended, Excel-compatible)
CSV: Comma-separated
Excel: Native .xlsx format

Columns to Include

Select which data columns to export
Use Ctrl/Cmd+click for multiple selection
Default: All columns

Include Header

First row contains column names
Recommended: Keep enabled

Include Position

Adds position column (1, 2, 3, …)
Shows ordering explicitly
Recommended: Keep enabled

Download Options

Download All Sequences (ZIP)

Creates ZIP archive with all sequence files
Individual files named: [datafile]_seq001.tsv, [datafile]_seq002.tsv, etc.
Archive named: [datafile]_sequences_YYYY-MM-DD.zip
Preserves folder structure

Download Summary

Exports summary table
Same format as selected in Output Format
Named: [datafile]_summary_YYYY-MM-DD.tsv
Contains: sequence IDs, methods, times, constraint status

File Naming All exports use your original data filename:

Sequences: [datafile]_seq001.tsv
Archive: [datafile]_sequences_2024-12-20.zip
Summary: [datafile]_summary_2024-12-20.tsv
Constraints: constraints_[datafile]_2024-12-20.tsv

Workflows

Basic Workflow

Upload data file
Add 1-2 constraints
Check feasibility (optional but recommended)
Click “Generate Sequences”
Review results
Download sequences

Advanced Workflow

Upload data file
Preview data, verify columns
Add multiple constraints OR import constraint file
Navigate to “Feasibility Check”
- Review complexity analysis
- Adjust constraints if difficulty is “Very High”
Set generation parameters based on complexity:
- Low/Moderate: Heuristic first, 5000 attempts
- High: MILP first, 300s timeout
- Very High: Consider relaxing constraints
Enable “Compute Diversity” and “Verify Constraints”
Generate sequences
Analyze diversity metrics
Preview individual sequences
Configure export settings
Download results

Troubleshooting Workflow

If generation fails:

Check Feasibility
- Go to “Feasibility Check” tab
- Look for constraint conflicts
- Note: Feasible doesn’t guarantee success, especially for complex problems
Review Complexity
- If difficulty is “High” or “Very High”:
  - Increase MILP timeout (e.g., 300-600 seconds)
  - Try MILP-first strategy
  - Consider relaxing constraints
Examine Console Output
- Enable “Show Detailed Progress”
- Look for specific error messages
- Check which method was attempted
Adjust Constraints
- Increase max_rep (allow more consecutive repetitions)
- Decrease min_dist (require less spacing)
- Remove one constraint temporarily to identify conflicts
Try Different Settings
- Increase heuristic attempts (10,000-20,000)
- Switch MILP solver (try Rglpk if HiGHS fails)
- Generate fewer sequences initially (test with 1-2)

Server Restrictions

If using a deployed server with restricted mode:

Limitations:

Maximum dataset size: 120 rows (configurable)
Maximum sequences: 20 (configurable)
Maximum heuristic attempts: 5,000 (configurable)
MILP solvers disabled (heuristic only)

Notice: A blue information box at the top displays current limits.

Full Version: Download and install locally to remove all restrictions and access MILP optimization.

Tips and Best Practices

Data Preparation

Use clear, descriptive column names
Avoid special characters in column names
Ensure data types are consistent within columns
Remove any completely empty rows/columns
Test with a small subset first (20-50 rows)

Constraint Design

Start simple: Add one constraint, test, then add more
Check feasibility after each constraint addition
More constraints = longer computation time
Very restrictive constraints may have no solutions
Balance constraint strictness with feasibility

Performance Optimization

For small datasets (<50 stimuli):
- Either method works well
- Heuristic is faster
For medium datasets (50-150 stimuli):
- Try heuristic first with 10,000 attempts
- Use MILP if heuristic fails or you need optimality
For large datasets (>150 stimuli):
- Heuristic is usually necessary
- MILP may be very slow or timeout
- Consider simplifying constraints

Diversity Analysis

Always enable for research applications
Compare retention across different constraint sets
80% retention suggests constraints don’t overly restrict
<30% retention may indicate overly strict constraints
100% retention is normal for flexible/unconstrained sequences

Reproducibility

Set random seed for reproducible results
Export constraints for future use
Save summary table with methods and parameters used
Document strategy and timeout settings

Frequently Asked Questions

Q: How long should generation take?

A: Depends on complexity:

Simple problems (low complexity): Seconds
Moderate problems: 1-5 minutes
Complex problems (high complexity): 5-30 minutes
Very complex problems: 30+ minutes or may timeout

Q: What if all sequences fail?

A: Check feasibility first. If feasible:

Increase timeouts
Try different strategy
Relax constraints
Verify data is correct

Q: Can I use the same constraints with different data?

A: Yes, if column names match. Export constraints from one session, import to another. If column names don’t match, you’ll see an error with available columns.

Q: Why is diversity >100%?

A: This is normal! The optimization algorithms can find more diverse orderings than pure randomization, especially for unconstrained or lightly constrained problems.

Q: Should I use heuristic or MILP?

Start with heuristic for speed
Use MILP if you need guaranteed optimal solutions
MILP is better for complex problems if you can wait
Check complexity analysis for recommendations

Q: What’s the difference between “max_consecutive” and “min_distance”?

max_consecutive: Controls immediate repetitions (back-to-back)
min_distance: Controls spacing across entire sequence
Example: max_consecutive=1 with min_distance=3
- Valid: [A, B, C, D, A] (A’s are 4 apart)
- Invalid: [A, A, B, C] (consecutive A’s)
- Invalid: [A, B, A, C] (A’s only 2 apart)

Q: Can I interrupt generation?

A: Not easily from the UI. If it’s taking too long:

Wait for timeout (MILP) or max attempts (heuristic)
Refresh browser to restart (loses progress)
Set lower timeout/attempts for future runs

Q: How do I cite this tool?

A: See README.md for citation information.

Q: Where is my data stored?

Local version: Only on your computer
Server deployment: Temporarily on server, not permanently stored
Generated sequences: Downloaded to your computer
No data is transmitted to external services

MILP Randomizer

Server Resource Limits

1. Data Input

2. Define Constraints

3. Generation Settings

Uploaded Data

Understanding Feasibility vs. Complexity

Constraint Feasibility Analysis

Problem Complexity Analysis

Interpreting Complexity Metrics

Generation Summary

Understanding Diversity Metrics

Diversity Metrics

Pairwise Distance Heatmap

Output Settings

Download

Shiny Application User Guide

Overview

Getting Started

1. Data Input

2. Define Constraints

Maximum Consecutive Repetitions

Minimum Distance Between Repetitions

Managing Constraints

3. Generation Settings

4. Analysis Tabs

Feasibility Check

Generation Results

Diversity Analysis

Sequence Preview

5. Export Results

Workflows

Basic Workflow

Advanced Workflow

Troubleshooting Workflow

Server Restrictions

Tips and Best Practices

Data Preparation

Constraint Design

Performance Optimization

Diversity Analysis

Reproducibility

Frequently Asked Questions