Meeting Notes on Password Cracking Benchmarks

Discussion on Statistical Terminology and Graph Analysis

  • Professor Vareen agreed that limited conclusions can be drawn from the graphs.
  • Graphs show data within certain intervals.
    • With filters, data is between 300 and 400.
    • With 1500 Shari, reaches up to 4000.
  • Hardware acceleration on CPU might reduce increment compared to GPU.
  • CPU performs strongly.

Pseudo Time Index and Experiment Independence

  • Pseudo time index refers to the order of the experiment.
  • Each experiment is independent.
  • Ideally, with consistent parameters, a flat line is expected.
  • Variations exist despite the expectation of equal parameters.
  • Password changes are a separate consideration.
  • In an ideal scenario with proper setup and uniform password extraction, the result should be flat.

Analysis of MD5 and Core Usage

  • MD5 shows little oscillation.
  • Core usage on CPU was examined.
  • Inconsistencies in behavior observed.
  • Division between IPE and OpenCL for GPU usage.
  • Differences may stem from the code's optimization by the individual who wrote it.
  • Profile 4 on MD5 seems better with IPE, possibly due to its native GPU integration, but results are inconsistent.

Variance Calculation and Graph Stability

  • Varying core numbers previously showed unusual behavior.
  • Increase reaches up to 1500, with others nearing 1000, indicating relative alignment.
  • I fusch appears more stable even beyond interpolation.

Consultation with Cristiano and Parameter Control

  • Cristiano was shown some graphs initially to assess the approach's validity.
  • Without precise knowledge of benchmark parameters and passwords, interpreting peaks is challenging.
  • Peaks may result from password variations affecting memory loading and cache times.
  • Experiments need tighter controls for clearer explanations of peak occurrences.

Password Handling and Brute Force Approaches

  • Ensure consistent passwords, possibly through brute force using a password list.
  • Two approaches: using masks and rules.
  • Alternative: benchmark with 100 MD5 passwords over a set time to observe Sharething; control password characteristics (e.g., length between 8 and 12 characters).
  • Hashcat allows random subset selection from a dictionary, but defining masks is necessary.

Defining Research Questions and Variables

  • Determine what the measurements aim to demonstrate.
  • Consider altering length, complexity, or specific profiles.
  • Current variables: algorithm, benchmark speed, and benchmark order.
  • A model with only speed as a variable is limiting.
  • Define research questions precisely.
  • Example: Estimate the advantage of a GPU attacker over a CPU attacker for various algorithms.

Parameter Impact and Experimental Design

  • Password length may not necessarily impact all algorithms due to constant-size block usage.
  • Hypothesis: Increasing parameters should reduce GPU advantage.
  • Systematically design experiments.

Splitting Research Questions

  • Divide research into two parts:
    • CPU vs. GPU comparison with the same algorithm.
    • Identifying the most secure algorithm for different situations.
  • Compare algorithms with standard parameters on Linux.

Linux Algorithms and Attacker Perspective

  • MD5 and SHA are used within more complex algorithms like MD5 Crypto or shadow fit, involving multiple iterations.
  • These iterations likely slow down both CPU and GPU performance similarly.
  • Focus on the attacker's perspective: which tools and hardware are most beneficial.
  • If John the Ripper performs better on CPU than Hashcat, an attacker might opt for multiple CPUs.
  • Implementation specifics are less relevant from this viewpoint.

Tool Comparison and Ethical Considerations

  • Comparing tools like John the Ripper and Hashcat may be more relevant from an attacker's perspective than comparing benchmark implementations.
  • Try cracking the same password list with different tools to compare their effectiveness.
  • Ethical question: Should all research material be made available, considering potential misuse?
  • In a previous work passwords were not made fully available for ethical reasons.

Core Scaling and Memory Usage

  • Transitioning from 0 to nearly double (200%) with Hashcat and MD5.
  • Reaching nearly 12 times the performance with 16 cores, aligning with expectations.
  • 16 cores consistently perform lower than 15 cores, suggesting memory sharing impacts.
  • Memory usage: Allocating 64K of memory; accesses within this space are password-dependent and hard to predict for attackers.
  • True memory-hard algorithms use significantly more memory.

Tool Optimization and Comparative Analysis

  • Tools may be optimized for specific hardware (CPU vs. GPU).
  • Consider the need of Perez and SELLET for cracking, given that custom scripts might be required.
  • Comparing memory-hard vs. non-memory-hard tools within the same tool (like John the Ripper) provides better comparison.
  • Caution against comparing results obtained with fundamentally different tools and techniques.

John the Ripper Peculiarities and Parameter Randomization

  • John the Ripper shows more oscillation.
  • Unclear why some core counts perform better than higher core counts.
  • Doubts persist regarding password selection and parameter changes during benchmarking.
  • Potential parameter randomization might explain certain peaks.
  • Testing is needed to fix parameters and observe consistency.

Experiment Stabilization and Controlled Conditions

  • Establish controlled experimental conditions.
  • Prioritize understanding and stabilizing tool behavior.
  • Aim for similar conditions across tests for coherent results.