Uploaded Data


              

Understanding Feasibility vs. Complexity

Feasibility Analysis checks whether it is theoretically possible to satisfy all constraints simultaneously.

Complexity Analysis estimates computational difficulty of finding valid solutions.

Constraint Feasibility Analysis


              

Problem Complexity Analysis

Interpreting Complexity Metrics
  • Number of Stimuli: Size of your dataset.
  • Binary Variables: Number of optimization variables.
  • Constraint Equations: Total mathematical constraints generated.
  • Constraint Density: Ratio of equations to variables.
  • Problem Scale: Overall size metric (logarithmic).
  • Estimated Difficulty: Overall assessment with recommendations.

            

Generation Summary


              

              


Understanding Diversity Metrics

Diversity metrics compare variability between generated sequences to random baseline.

  • Kendall Tau Distance: Measures dissimilarity (0-1).
  • Random Baseline: Expected diversity from random orderings.
  • Diversity Retention: Percentage preserved. >100% possible with optimization.

Diversity Metrics


              

Pairwise Distance Heatmap




Output Settings


Download

Download All Sequences (ZIP)
Download Summary

File Naming

Files are named based on your uploaded data file:

  • Sequences: [datafile]_seq001.tsv, [datafile]_seq002.tsv, ...
  • Archive: [datafile]_sequences_[date].zip
  • Summary: [datafile]_summary_[date].tsv

Shiny Application User Guide

Overview

MILP Randomizer provides an interactive web interface for creating optimized stimulus orderings with custom constraints. This guide explains each feature and how to use the application effectively.

Getting Started

1. Data Input

Upload Your Data File

  • Supported formats: CSV, TSV, Excel (.xlsx, .xls)
  • Maximum size: Configurable (default 1000 rows, 120 in restricted mode)
  • Requirements:
    • First row should contain column headers
    • Each row represents one stimulus/trial
    • Include all relevant attributes (stimulus type, category, ID, etc.)

Example Data Structure:

stimulus_id,category,valence,frequency
stim_001,noun,positive,high
stim_002,verb,neutral,medium
stim_003,noun,negative,low
...

Data Preview Tab

  • View uploaded data summary
  • Check number of rows and columns
  • Verify column names match your expectations
  • Preview first 100 rows in interactive table

2. Define Constraints

Constraints control how stimuli can be ordered. The application supports two types:

Maximum Consecutive Repetitions

Prevents the same value from appearing too many times in a row.

Example: “No more than 2 consecutive trials of the same category”

  • Type: Max Consecutive
  • Column: category
  • Max Repetitions: 2

Result: Valid: [A, A, B, C, C, A] | Invalid: [A, A, A, B, C]

Minimum Distance Between Repetitions

Ensures spacing between occurrences of the same value.

Example: “At least 3 positions between same stimulus”

  • Type: Min Distance
  • Column: stimulus_id
  • Min Distance: 3

Result: Valid: [A, B, C, D, A] | Invalid: [A, B, C, A]

Managing Constraints

Add Constraint

  1. Click “Add Constraint” button
  2. Select constraint type
  3. Choose column from your data
  4. Set parameter value (max repetitions or min distance)

Remove Constraint

  • Click the × button on any constraint card

Import Constraints

  • Click “Import” button
  • Select a TSV/CSV file with constraints
  • File must have columns: type, column, max_rep (for max_consecutive), min_dist (for min_distance)
  • Important: Column names must match your dataset exactly
  • If columns don’t match, you’ll see an error listing available columns

Export Constraints

  • Click “Export Constraints” button
  • Saves current constraints to TSV file
  • Filename: constraints_[datafile]_YYYY-MM-DD.tsv
  • Use this file to reuse constraints with similar datasets

Constraint Numbering

  • Constraints are numbered sequentially for display
  • Numbers update automatically when constraints are removed
  • Import replaces all existing constraints (with confirmation)

3. Generation Settings

Number of Sequences

  • How many different orderings to generate
  • Range: 1 to configurable maximum (default 100, 20 in restricted mode)
  • More sequences = better diversity analysis but longer computation

Strategy

Heuristic First (Default, recommended for most cases)

  • Tries fast heuristic search first
  • Falls back to MILP if heuristic fails
  • Usually completes in seconds to minutes
  • May not always find optimal solution

MILP First (Guaranteed optimal solutions)

  • Uses mathematical optimization (HiGHS solver)
  • Falls back to heuristic if MILP times out
  • May take minutes to hours for complex problems
  • Provides proven optimal solutions when successful

Note: In restricted mode, only heuristic method is available.

Heuristic Max Attempts

  • How many random orderings to try
  • Higher = better chance of success but slower
  • Recommended: 5,000 for simple problems, 10,000+ for complex
  • Maximum: Configurable (default 50,000, 5,000 in restricted mode)

MILP Settings (if enabled)

MILP Timeout

  • Maximum time allowed for MILP solver (seconds)
  • Increase for complex problems
  • Recommended: 120s default, 300-600s for difficult problems

MILP Solver

  • HiGHS (recommended): Fast, modern solver
  • Rglpk: Alternative solver, sometimes better for specific problems
  • ROI: Generic interface, uses available solvers

Additional Options

Verify Constraints

  • Checks generated sequences satisfy all constraints
  • Recommended: Keep enabled
  • Adds minimal computation time

Compute Diversity

  • Calculates variability between sequences
  • Compares to random baseline
  • Always computed when checked, even without constraints
  • Useful for understanding constraint impact on diversity

Show Detailed Progress

  • Displays verbose console output
  • Shows attempt numbers, solver status, timing
  • Helpful for troubleshooting

Random Seed

  • Optional number for reproducible results
  • Same seed + same data/constraints = identical sequences
  • Leave empty for random generation

4. Analysis Tabs

Feasibility Check

Purpose: Verify constraints before generation

Feasibility Analysis

  • Checks if constraints can be satisfied simultaneously
  • Identifies logical conflicts
  • Example conflict: “Max 1 consecutive of A” but only 3 total items

Complexity Analysis

  • Estimates computational difficulty
  • Helps choose appropriate strategy

Metrics Explained:

  • Stimuli: Dataset size
  • Binary Variables: Optimization problem size (stimuli × positions)
  • Constraint Equations: Total mathematical constraints generated
  • Constraint Density: Equations/variables ratio (>1.0 = very constrained)
  • Problem Scale: Logarithmic complexity measure
    • <3: Low complexity
    • 3-5: Moderate complexity
    • 5-7: High complexity
    • 7: Very high complexity

  • Difficulty: Overall assessment with method recommendation

What’s the Difference?

  • Feasibility: Can constraints be satisfied? (yes/no question)
  • Complexity: How hard to find solutions? (computational question)

A problem can be feasible but very complex, or infeasible but simple.

Generation Results

Generation Summary

  • Total sequences attempted
  • Successful vs. failed
  • Methods used (heuristic/MILP)
  • Average generation time

Console Output

  • Detailed log of generation process
  • Visible when “Show Detailed Progress” enabled
  • Shows per-sequence status

Error Messages

  • Displayed if generation fails
  • Includes suggestions for resolution:
    • Check feasibility
    • Relax constraints
    • Change strategy
    • Increase timeouts

Summary Table

  • One row per sequence
  • Columns: ID, status, method, size, time, constraint satisfaction
  • Sortable and searchable

Diversity Analysis

Purpose: Understand variability between generated sequences

What is Diversity?

  • Measures how different sequences are from each other
  • Uses Kendall Tau distance (correlation-based metric)
  • Range: 0 (identical) to 1 (maximally different)

Metrics Displayed:

Kendall Tau Distance

  • Mean: Average dissimilarity across all sequence pairs
  • SD: Standard deviation of distances
  • Higher mean = more diverse sequences

Random Baseline

  • Expected diversity if sequences were fully random
  • Provides context for interpreting observed diversity
  • Computed from 100 random orderings

Diversity Retention

  • Percentage: (Observed / Baseline) × 100%
  • Shows how much variability survived constraint application

Interpreting Retention:

  • <30%: Very low diversity (highly restrictive constraints)
  • 30-50%: Low diversity (moderate restrictions)
  • 50-80%: Moderate diversity (balanced constraints)
  • 80-100%: High diversity (flexible constraints)
  • 100%: Normal for unconstrained or lightly constrained sequences

Why >100%?

  • Optimization algorithms can find more diverse orderings than pure randomization
  • Common for unconstrained sequences or when constraints allow flexibility
  • Indicates the algorithm explored solution space effectively

Pairwise Distance Heatmap

  • Visual representation of all sequence pairs
  • Blue = similar, Red = different
  • Diagonal is always 0 (sequence compared to itself)

Note: Diversity requires at least 2 successful sequences and “Compute Diversity” enabled.

Sequence Preview

  • Select any generated sequence from dropdown
  • View full ordering in table format
  • Shows all columns (or selected output columns)
  • Includes position column if enabled
  • Useful for manual inspection

5. Export Results

Output Settings

Output Format

  • TSV: Tab-separated (recommended, Excel-compatible)
  • CSV: Comma-separated
  • Excel: Native .xlsx format

Columns to Include

  • Select which data columns to export
  • Use Ctrl/Cmd+click for multiple selection
  • Default: All columns

Include Header

  • First row contains column names
  • Recommended: Keep enabled

Include Position

  • Adds position column (1, 2, 3, …)
  • Shows ordering explicitly
  • Recommended: Keep enabled

Download Options

Download All Sequences (ZIP)

  • Creates ZIP archive with all sequence files
  • Individual files named: [datafile]_seq001.tsv, [datafile]_seq002.tsv, etc.
  • Archive named: [datafile]_sequences_YYYY-MM-DD.zip
  • Preserves folder structure

Download Summary

  • Exports summary table
  • Same format as selected in Output Format
  • Named: [datafile]_summary_YYYY-MM-DD.tsv
  • Contains: sequence IDs, methods, times, constraint status

File Naming All exports use your original data filename:

  • Sequences: [datafile]_seq001.tsv
  • Archive: [datafile]_sequences_2024-12-20.zip
  • Summary: [datafile]_summary_2024-12-20.tsv
  • Constraints: constraints_[datafile]_2024-12-20.tsv

Workflows

Basic Workflow

  1. Upload data file
  2. Add 1-2 constraints
  3. Check feasibility (optional but recommended)
  4. Click “Generate Sequences”
  5. Review results
  6. Download sequences

Advanced Workflow

  1. Upload data file
  2. Preview data, verify columns
  3. Add multiple constraints OR import constraint file
  4. Navigate to “Feasibility Check”
    • Review complexity analysis
    • Adjust constraints if difficulty is “Very High”
  5. Set generation parameters based on complexity:
    • Low/Moderate: Heuristic first, 5000 attempts
    • High: MILP first, 300s timeout
    • Very High: Consider relaxing constraints
  6. Enable “Compute Diversity” and “Verify Constraints”
  7. Generate sequences
  8. Analyze diversity metrics
  9. Preview individual sequences
  10. Configure export settings
  11. Download results

Troubleshooting Workflow

If generation fails:

  1. Check Feasibility

    • Go to “Feasibility Check” tab
    • Look for constraint conflicts
    • Note: Feasible doesn’t guarantee success, especially for complex problems
  2. Review Complexity

    • If difficulty is “High” or “Very High”:
      • Increase MILP timeout (e.g., 300-600 seconds)
      • Try MILP-first strategy
      • Consider relaxing constraints
  3. Examine Console Output

    • Enable “Show Detailed Progress”
    • Look for specific error messages
    • Check which method was attempted
  4. Adjust Constraints

    • Increase max_rep (allow more consecutive repetitions)
    • Decrease min_dist (require less spacing)
    • Remove one constraint temporarily to identify conflicts
  5. Try Different Settings

    • Increase heuristic attempts (10,000-20,000)
    • Switch MILP solver (try Rglpk if HiGHS fails)
    • Generate fewer sequences initially (test with 1-2)

Server Restrictions

If using a deployed server with restricted mode:

Limitations:

  • Maximum dataset size: 120 rows (configurable)
  • Maximum sequences: 20 (configurable)
  • Maximum heuristic attempts: 5,000 (configurable)
  • MILP solvers disabled (heuristic only)

Notice: A blue information box at the top displays current limits.

Full Version: Download and install locally to remove all restrictions and access MILP optimization.

Tips and Best Practices

Data Preparation

  • Use clear, descriptive column names
  • Avoid special characters in column names
  • Ensure data types are consistent within columns
  • Remove any completely empty rows/columns
  • Test with a small subset first (20-50 rows)

Constraint Design

  • Start simple: Add one constraint, test, then add more
  • Check feasibility after each constraint addition
  • More constraints = longer computation time
  • Very restrictive constraints may have no solutions
  • Balance constraint strictness with feasibility

Performance Optimization

  • For small datasets (<50 stimuli):

    • Either method works well
    • Heuristic is faster
  • For medium datasets (50-150 stimuli):

    • Try heuristic first with 10,000 attempts
    • Use MILP if heuristic fails or you need optimality
  • For large datasets (>150 stimuli):

    • Heuristic is usually necessary
    • MILP may be very slow or timeout
    • Consider simplifying constraints

Diversity Analysis

  • Always enable for research applications
  • Compare retention across different constraint sets
  • 80% retention suggests constraints don’t overly restrict

  • <30% retention may indicate overly strict constraints
  • 100% retention is normal for flexible/unconstrained sequences

Reproducibility

  • Set random seed for reproducible results
  • Export constraints for future use
  • Save summary table with methods and parameters used
  • Document strategy and timeout settings

Frequently Asked Questions

Q: How long should generation take?

A: Depends on complexity:

  • Simple problems (low complexity): Seconds
  • Moderate problems: 1-5 minutes
  • Complex problems (high complexity): 5-30 minutes
  • Very complex problems: 30+ minutes or may timeout

Q: What if all sequences fail?

A: Check feasibility first. If feasible:

  1. Increase timeouts
  2. Try different strategy
  3. Relax constraints
  4. Verify data is correct

Q: Can I use the same constraints with different data?

A: Yes, if column names match. Export constraints from one session, import to another. If column names don’t match, you’ll see an error with available columns.

Q: Why is diversity >100%?

A: This is normal! The optimization algorithms can find more diverse orderings than pure randomization, especially for unconstrained or lightly constrained problems.

Q: Should I use heuristic or MILP?

A:

  • Start with heuristic for speed
  • Use MILP if you need guaranteed optimal solutions
  • MILP is better for complex problems if you can wait
  • Check complexity analysis for recommendations

Q: What’s the difference between “max_consecutive” and “min_distance”?

A:

  • max_consecutive: Controls immediate repetitions (back-to-back)
  • min_distance: Controls spacing across entire sequence
  • Example: max_consecutive=1 with min_distance=3
    • Valid: [A, B, C, D, A] (A’s are 4 apart)
    • Invalid: [A, A, B, C] (consecutive A’s)
    • Invalid: [A, B, A, C] (A’s only 2 apart)

Q: Can I interrupt generation?

A: Not easily from the UI. If it’s taking too long:

  1. Wait for timeout (MILP) or max attempts (heuristic)
  2. Refresh browser to restart (loses progress)
  3. Set lower timeout/attempts for future runs

Q: How do I cite this tool?

A: See README.md for citation information.

Q: Where is my data stored?

A:

  • Local version: Only on your computer
  • Server deployment: Temporarily on server, not permanently stored
  • Generated sequences: Downloaded to your computer
  • No data is transmitted to external services