High Frequency Data and the Flash Crash

The Flash Crash (May 6, 2010)

  • A day that profoundly impacted financial data modeling.
  • The Dow Jones Industrial Average went down about 220 points - a normal fluctuation.
  • However, during a 36-minute stretch in the afternoon, a dramatic shift occurred.
  • The market plummeted by about 10%, causing panic.
  • Theories abound, but no definitive cause has been determined.
  • The market rebounded just as quickly.
  • This ushered in the era of "high frequency data."

High Frequency Data

  • Definition of Frequency: The time elapsed between rows of data.
  • Past Definition: Daily data was once considered high frequency.
  • Flash Crash Impact: Showed the need for even higher frequency examination.
  • Not Hourly: Hourly data is insufficient, as the event lasted 36 minutes.
  • Not Minute/Second: Even minute-by-minute or second-by-second might not be precise enough.
  • Gold standard: Transaction by transaction (tick by tick) or on every stock transaction (buy or sell).

Transaction Level Data: Caterpillar Example (January 4, 2010)

  • Example: Transaction-level data on Caterpillar posted on Blackboard.
  • Market opens at 9:30 AM East Coast time.
  • Transactions occur within the first second of the day, varying in size and price.
  • Not a Cleanly Defined Time Dimension: The time elapsed between rows is not precisely known.
  • Speed of Modern Financial Trading: Demonstrates the high speed of modern financial trading.
  • Example: Nearly 40,000 transactions in one day for Caterpillar.
  • Computational Power: Examining high-frequency data requires significant computational power due to the volume of transactions.
    • One day: almost 40,000 transactions.

Visualizing Caterpillar's Stock Price

  • The presenter plotted the price of Caterpillar stock over the course of the day.
  • The Y-axis shows the price, indicating that it started below 5858, briefly went above 5959, and ended around 58.5058.50.
  • Daily Data: If only daily data is examined, the intraday peak would be missed.

Peculiarities of High Frequency Data

  • Analogy: Looking at fabric, daily data might be looking with just your eyes, whereas tick data is looking with a microscope.

Unequally Spaced Time Intervals

  • Issue: Transactions do not occur at equally spaced intervals.
  • Impact: Makes working with this type of data tricky.

Discrete Price Movement

  • Price increments appear in discrete chunks (e.g., 0¢, 5¢, 2¢, 1¢).
  • Historical Context: Prior to 1997, prices were measured in fractions of a dollar (eighths).
  • Regardless of the measurement unit (eighths or cents), movements at the tick-by-tick level are chunky.
  • Plotting price movement instead of price shows that almost all price movements are 0¢, 1¢, or 2¢.
  • When price movements are discreet, this requires specialized statistical methods.
  • Example: Number of doctor office visits is a more reliable measure than health care spend.
  • Drilling home: Stock price movement for caterpillar is not continuous.

Non-Constant Volume

  • Issue: The volume involved in transactions is not constant.
  • Impact: Cannot be ignored at the high frequency level.
  • Pattern: Heavier activity at the opening and close. Thinner activity at lunchtime.
  • Activity tends to get thinner at lunchtime. Even now, basic human needs to feed ourselves take priority over financial trading.
  • Not all rows of data are equally impacted. A hundred shares trading hands is not the same as 8,000 shares.

Multiple Transactions within Single Second Windows

  • Transactions can occur at different prices within the same second.
  • Implication: Drilling down to the millisecond is becoming increasingly common.
  • Computational Challenges: Millisecond data increases data size, making it difficult to handle and email. Would require data to be loaded to a server.
  • Time Dimension: The definition of time becomes elastic, making the data difficult to work with.

Statistical Mirages

  • Quirks can lead to statistical mirages.
  • High frequency data requires special care.

Non-Synchronous Trading

  • The key detail on what causes statistical mirages.
  • Example: Stocks A and B are not related. A trades more frequently than B. News arrives at the end of the day. A might be affected today, whereas B waits till tomorrow. Statistically, A seems to lead B.
  • Definition: A and B are not synchronized. A reacts to news quicker.
  • Daily Data: This is not a concern with daily data because all major stocks will trade.
  • High Frequency: At the tick level, there are moments when a stock trades and moments when it doesn't.
  • This creates all kinds of hiccups.

Calculating Return and Autocorrelation

  • Issue: It is not even clear that the return is what we want.
  • Caterpillar's return today does not correlate with its behavior yesterday or its behavior two days ago.
  • Large negative correlations between transactions. If Caterpillar's price went up in the previous transaction, it is going down during the next two, or even three. There is a temptation to form a trading strategy on that.
    • Problem: that what we see on the screen that leads us to create a trading strategy is a statistical mirage.

Mathematical Derivation of One Transaction Correlation

*The derivation can be written as follows:

One Transaction Correlation=(μ2)(σ2)(1π)\text{One Transaction Correlation} = -\frac{(\mu^2)}{(\sigma^2)} (1 - \pi)

Where
μ\mu: The mean return throughout the day.
σ2\sigma^2: The variance of the return that day.
π\pi: The probability of the stock not trading during a period of time (non-synchronous trading).

  • It is this negative attached to all this positive stuff that is producing this big spike.
  • Non Synchronous Trading: Tesla is going to trade in a day, Starbucks is always going to trade in a day, so when a stock always trades, you don't have to worry about non synchronous trading.
  • Artifact: The negative values are an artifact of non-synchronous trading, not an exploitable pattern.