Uploaded Data
Understanding Feasibility vs. Complexity
Feasibility Analysis checks whether it is theoretically possible to satisfy all constraints simultaneously.
Complexity Analysis estimates computational difficulty of finding valid solutions.
Constraint Feasibility Analysis
Problem Complexity Analysis
Interpreting Complexity Metrics
- Number of Stimuli: Size of your dataset.
- Binary Variables: Number of optimization variables.
- Constraint Equations: Total mathematical constraints generated.
- Constraint Density: Ratio of equations to variables.
- Problem Scale: Overall size metric (logarithmic).
- Estimated Difficulty: Overall assessment with recommendations.
Generation Summary
Understanding Diversity Metrics
Diversity metrics compare variability between generated sequences to random baseline.
- Kendall Tau Distance: Measures dissimilarity (0-1).
- Random Baseline: Expected diversity from random orderings.
- Diversity Retention: Percentage preserved. >100% possible with optimization.
Diversity Metrics
Pairwise Distance Heatmap
Output Settings
Download
Download All Sequences (ZIP)Download Summary
File Naming
Files are named based on your uploaded data file:
- Sequences: [datafile]_seq001.tsv, [datafile]_seq002.tsv, ...
- Archive: [datafile]_sequences_[date].zip
- Summary: [datafile]_summary_[date].tsv
Shiny Application User Guide
Overview
MILP Randomizer provides an interactive web interface for creating optimized stimulus orderings with custom constraints. This guide explains each feature and how to use the application effectively.
Getting Started
1. Data Input
Upload Your Data File
- Supported formats: CSV, TSV, Excel (.xlsx, .xls)
- Maximum size: Configurable (default 1000 rows, 120 in restricted mode)
- Requirements:
- First row should contain column headers
- Each row represents one stimulus/trial
- Include all relevant attributes (stimulus type, category, ID, etc.)
Example Data Structure:
stimulus_id,category,valence,frequency
stim_001,noun,positive,high
stim_002,verb,neutral,medium
stim_003,noun,negative,low
...
Data Preview Tab
- View uploaded data summary
- Check number of rows and columns
- Verify column names match your expectations
- Preview first 100 rows in interactive table
2. Define Constraints
Constraints control how stimuli can be ordered. The application supports two types:
Maximum Consecutive Repetitions
Prevents the same value from appearing too many times in a row.
Example: “No more than 2 consecutive trials of the same category”
- Type:
Max Consecutive - Column:
category - Max Repetitions:
2
Result: Valid: [A, A, B, C, C, A] | Invalid: [A, A, A, B, C]
Minimum Distance Between Repetitions
Ensures spacing between occurrences of the same value.
Example: “At least 3 positions between same stimulus”
- Type:
Min Distance - Column:
stimulus_id - Min Distance:
3
Result: Valid: [A, B, C, D, A] | Invalid: [A, B, C, A]
Managing Constraints
Add Constraint
- Click “Add Constraint” button
- Select constraint type
- Choose column from your data
- Set parameter value (max repetitions or min distance)
Remove Constraint
- Click the
×button on any constraint card
Import Constraints
- Click “Import” button
- Select a TSV/CSV file with constraints
- File must have columns:
type,column,max_rep(for max_consecutive),min_dist(for min_distance) - Important: Column names must match your dataset exactly
- If columns don’t match, you’ll see an error listing available columns
Export Constraints
- Click “Export Constraints” button
- Saves current constraints to TSV file
- Filename:
constraints_[datafile]_YYYY-MM-DD.tsv - Use this file to reuse constraints with similar datasets
Constraint Numbering
- Constraints are numbered sequentially for display
- Numbers update automatically when constraints are removed
- Import replaces all existing constraints (with confirmation)
3. Generation Settings
Number of Sequences
- How many different orderings to generate
- Range: 1 to configurable maximum (default 100, 20 in restricted mode)
- More sequences = better diversity analysis but longer computation
Strategy
Heuristic First (Default, recommended for most cases)
- Tries fast heuristic search first
- Falls back to MILP if heuristic fails
- Usually completes in seconds to minutes
- May not always find optimal solution
MILP First (Guaranteed optimal solutions)
- Uses mathematical optimization (HiGHS solver)
- Falls back to heuristic if MILP times out
- May take minutes to hours for complex problems
- Provides proven optimal solutions when successful
Note: In restricted mode, only heuristic method is available.
Heuristic Max Attempts
- How many random orderings to try
- Higher = better chance of success but slower
- Recommended: 5,000 for simple problems, 10,000+ for complex
- Maximum: Configurable (default 50,000, 5,000 in restricted mode)
MILP Settings (if enabled)
MILP Timeout
- Maximum time allowed for MILP solver (seconds)
- Increase for complex problems
- Recommended: 120s default, 300-600s for difficult problems
MILP Solver
- HiGHS (recommended): Fast, modern solver
- Rglpk: Alternative solver, sometimes better for specific problems
- ROI: Generic interface, uses available solvers
Additional Options
Verify Constraints
- Checks generated sequences satisfy all constraints
- Recommended: Keep enabled
- Adds minimal computation time
Compute Diversity
- Calculates variability between sequences
- Compares to random baseline
- Always computed when checked, even without constraints
- Useful for understanding constraint impact on diversity
Show Detailed Progress
- Displays verbose console output
- Shows attempt numbers, solver status, timing
- Helpful for troubleshooting
Random Seed
- Optional number for reproducible results
- Same seed + same data/constraints = identical sequences
- Leave empty for random generation
4. Analysis Tabs
Feasibility Check
Purpose: Verify constraints before generation
Feasibility Analysis
- Checks if constraints can be satisfied simultaneously
- Identifies logical conflicts
- Example conflict: “Max 1 consecutive of A” but only 3 total items
Complexity Analysis
- Estimates computational difficulty
- Helps choose appropriate strategy
Metrics Explained:
- Stimuli: Dataset size
- Binary Variables: Optimization problem size (stimuli × positions)
- Constraint Equations: Total mathematical constraints generated
- Constraint Density: Equations/variables ratio (>1.0 = very constrained)
- Problem Scale: Logarithmic complexity measure
- <3: Low complexity
- 3-5: Moderate complexity
- 5-7: High complexity
-
7: Very high complexity
- Difficulty: Overall assessment with method recommendation
What’s the Difference?
- Feasibility: Can constraints be satisfied? (yes/no question)
- Complexity: How hard to find solutions? (computational question)
A problem can be feasible but very complex, or infeasible but simple.
Generation Results
Generation Summary
- Total sequences attempted
- Successful vs. failed
- Methods used (heuristic/MILP)
- Average generation time
Console Output
- Detailed log of generation process
- Visible when “Show Detailed Progress” enabled
- Shows per-sequence status
Error Messages
- Displayed if generation fails
- Includes suggestions for resolution:
- Check feasibility
- Relax constraints
- Change strategy
- Increase timeouts
Summary Table
- One row per sequence
- Columns: ID, status, method, size, time, constraint satisfaction
- Sortable and searchable
Diversity Analysis
Purpose: Understand variability between generated sequences
What is Diversity?
- Measures how different sequences are from each other
- Uses Kendall Tau distance (correlation-based metric)
- Range: 0 (identical) to 1 (maximally different)
Metrics Displayed:
Kendall Tau Distance
- Mean: Average dissimilarity across all sequence pairs
- SD: Standard deviation of distances
- Higher mean = more diverse sequences
Random Baseline
- Expected diversity if sequences were fully random
- Provides context for interpreting observed diversity
- Computed from 100 random orderings
Diversity Retention
- Percentage: (Observed / Baseline) × 100%
- Shows how much variability survived constraint application
Interpreting Retention:
- <30%: Very low diversity (highly restrictive constraints)
- 30-50%: Low diversity (moderate restrictions)
- 50-80%: Moderate diversity (balanced constraints)
- 80-100%: High diversity (flexible constraints)
-
100%: Normal for unconstrained or lightly constrained sequences
Why >100%?
- Optimization algorithms can find more diverse orderings than pure randomization
- Common for unconstrained sequences or when constraints allow flexibility
- Indicates the algorithm explored solution space effectively
Pairwise Distance Heatmap
- Visual representation of all sequence pairs
- Blue = similar, Red = different
- Diagonal is always 0 (sequence compared to itself)
Note: Diversity requires at least 2 successful sequences and “Compute Diversity” enabled.
Sequence Preview
- Select any generated sequence from dropdown
- View full ordering in table format
- Shows all columns (or selected output columns)
- Includes position column if enabled
- Useful for manual inspection
5. Export Results
Output Settings
Output Format
- TSV: Tab-separated (recommended, Excel-compatible)
- CSV: Comma-separated
- Excel: Native .xlsx format
Columns to Include
- Select which data columns to export
- Use Ctrl/Cmd+click for multiple selection
- Default: All columns
Include Header
- First row contains column names
- Recommended: Keep enabled
Include Position
- Adds position column (1, 2, 3, …)
- Shows ordering explicitly
- Recommended: Keep enabled
Download Options
Download All Sequences (ZIP)
- Creates ZIP archive with all sequence files
- Individual files named:
[datafile]_seq001.tsv,[datafile]_seq002.tsv, etc. - Archive named:
[datafile]_sequences_YYYY-MM-DD.zip - Preserves folder structure
Download Summary
- Exports summary table
- Same format as selected in Output Format
- Named:
[datafile]_summary_YYYY-MM-DD.tsv - Contains: sequence IDs, methods, times, constraint status
File Naming All exports use your original data filename:
- Sequences:
[datafile]_seq001.tsv - Archive:
[datafile]_sequences_2024-12-20.zip - Summary:
[datafile]_summary_2024-12-20.tsv - Constraints:
constraints_[datafile]_2024-12-20.tsv
Workflows
Basic Workflow
- Upload data file
- Add 1-2 constraints
- Check feasibility (optional but recommended)
- Click “Generate Sequences”
- Review results
- Download sequences
Advanced Workflow
- Upload data file
- Preview data, verify columns
- Add multiple constraints OR import constraint file
- Navigate to “Feasibility Check”
- Review complexity analysis
- Adjust constraints if difficulty is “Very High”
- Set generation parameters based on complexity:
- Low/Moderate: Heuristic first, 5000 attempts
- High: MILP first, 300s timeout
- Very High: Consider relaxing constraints
- Enable “Compute Diversity” and “Verify Constraints”
- Generate sequences
- Analyze diversity metrics
- Preview individual sequences
- Configure export settings
- Download results
Troubleshooting Workflow
If generation fails:
-
Check Feasibility
- Go to “Feasibility Check” tab
- Look for constraint conflicts
- Note: Feasible doesn’t guarantee success, especially for complex problems
-
Review Complexity
- If difficulty is “High” or “Very High”:
- Increase MILP timeout (e.g., 300-600 seconds)
- Try MILP-first strategy
- Consider relaxing constraints
- If difficulty is “High” or “Very High”:
-
Examine Console Output
- Enable “Show Detailed Progress”
- Look for specific error messages
- Check which method was attempted
-
Adjust Constraints
- Increase max_rep (allow more consecutive repetitions)
- Decrease min_dist (require less spacing)
- Remove one constraint temporarily to identify conflicts
-
Try Different Settings
- Increase heuristic attempts (10,000-20,000)
- Switch MILP solver (try Rglpk if HiGHS fails)
- Generate fewer sequences initially (test with 1-2)
Server Restrictions
If using a deployed server with restricted mode:
Limitations:
- Maximum dataset size: 120 rows (configurable)
- Maximum sequences: 20 (configurable)
- Maximum heuristic attempts: 5,000 (configurable)
- MILP solvers disabled (heuristic only)
Notice: A blue information box at the top displays current limits.
Full Version: Download and install locally to remove all restrictions and access MILP optimization.
Tips and Best Practices
Data Preparation
- Use clear, descriptive column names
- Avoid special characters in column names
- Ensure data types are consistent within columns
- Remove any completely empty rows/columns
- Test with a small subset first (20-50 rows)
Constraint Design
- Start simple: Add one constraint, test, then add more
- Check feasibility after each constraint addition
- More constraints = longer computation time
- Very restrictive constraints may have no solutions
- Balance constraint strictness with feasibility
Performance Optimization
-
For small datasets (<50 stimuli):
- Either method works well
- Heuristic is faster
-
For medium datasets (50-150 stimuli):
- Try heuristic first with 10,000 attempts
- Use MILP if heuristic fails or you need optimality
-
For large datasets (>150 stimuli):
- Heuristic is usually necessary
- MILP may be very slow or timeout
- Consider simplifying constraints
Diversity Analysis
- Always enable for research applications
- Compare retention across different constraint sets
-
80% retention suggests constraints don’t overly restrict
- <30% retention may indicate overly strict constraints
-
100% retention is normal for flexible/unconstrained sequences
Reproducibility
- Set random seed for reproducible results
- Export constraints for future use
- Save summary table with methods and parameters used
- Document strategy and timeout settings
Frequently Asked Questions
Q: How long should generation take?
A: Depends on complexity:
- Simple problems (low complexity): Seconds
- Moderate problems: 1-5 minutes
- Complex problems (high complexity): 5-30 minutes
- Very complex problems: 30+ minutes or may timeout
Q: What if all sequences fail?
A: Check feasibility first. If feasible:
- Increase timeouts
- Try different strategy
- Relax constraints
- Verify data is correct
Q: Can I use the same constraints with different data?
A: Yes, if column names match. Export constraints from one session, import to another. If column names don’t match, you’ll see an error with available columns.
Q: Why is diversity >100%?
A: This is normal! The optimization algorithms can find more diverse orderings than pure randomization, especially for unconstrained or lightly constrained problems.
Q: Should I use heuristic or MILP?
A:
- Start with heuristic for speed
- Use MILP if you need guaranteed optimal solutions
- MILP is better for complex problems if you can wait
- Check complexity analysis for recommendations
Q: What’s the difference between “max_consecutive” and “min_distance”?
A:
- max_consecutive: Controls immediate repetitions (back-to-back)
- min_distance: Controls spacing across entire sequence
- Example: max_consecutive=1 with min_distance=3
- Valid:
[A, B, C, D, A](A’s are 4 apart) - Invalid:
[A, A, B, C](consecutive A’s) - Invalid:
[A, B, A, C](A’s only 2 apart)
- Valid:
Q: Can I interrupt generation?
A: Not easily from the UI. If it’s taking too long:
- Wait for timeout (MILP) or max attempts (heuristic)
- Refresh browser to restart (loses progress)
- Set lower timeout/attempts for future runs
Q: How do I cite this tool?
A: See README.md for citation information.
Q: Where is my data stored?
A:
- Local version: Only on your computer
- Server deployment: Temporarily on server, not permanently stored
- Generated sequences: Downloaded to your computer
- No data is transmitted to external services