High Frequency Data and the Flash Crash
The Flash Crash (May 6, 2010)
- A day that profoundly impacted financial data modeling.
- The Dow Jones Industrial Average went down about 220 points - a normal fluctuation.
- However, during a 36-minute stretch in the afternoon, a dramatic shift occurred.
- The market plummeted by about 10%, causing panic.
- Theories abound, but no definitive cause has been determined.
- The market rebounded just as quickly.
- This ushered in the era of "high frequency data."
High Frequency Data
- Definition of Frequency: The time elapsed between rows of data.
- Past Definition: Daily data was once considered high frequency.
- Flash Crash Impact: Showed the need for even higher frequency examination.
- Not Hourly: Hourly data is insufficient, as the event lasted 36 minutes.
- Not Minute/Second: Even minute-by-minute or second-by-second might not be precise enough.
- Gold standard: Transaction by transaction (tick by tick) or on every stock transaction (buy or sell).
Transaction Level Data: Caterpillar Example (January 4, 2010)
- Example: Transaction-level data on Caterpillar posted on Blackboard.
- Market opens at 9:30 AM East Coast time.
- Transactions occur within the first second of the day, varying in size and price.
- Not a Cleanly Defined Time Dimension: The time elapsed between rows is not precisely known.
- Speed of Modern Financial Trading: Demonstrates the high speed of modern financial trading.
- Example: Nearly 40,000 transactions in one day for Caterpillar.
- Computational Power: Examining high-frequency data requires significant computational power due to the volume of transactions.
- One day: almost 40,000 transactions.
Visualizing Caterpillar's Stock Price
- The presenter plotted the price of Caterpillar stock over the course of the day.
- The Y-axis shows the price, indicating that it started below , briefly went above , and ended around .
- Daily Data: If only daily data is examined, the intraday peak would be missed.
Peculiarities of High Frequency Data
- Analogy: Looking at fabric, daily data might be looking with just your eyes, whereas tick data is looking with a microscope.
Unequally Spaced Time Intervals
- Issue: Transactions do not occur at equally spaced intervals.
- Impact: Makes working with this type of data tricky.
Discrete Price Movement
- Price increments appear in discrete chunks (e.g., 0¢, 5¢, 2¢, 1¢).
- Historical Context: Prior to 1997, prices were measured in fractions of a dollar (eighths).
- Regardless of the measurement unit (eighths or cents), movements at the tick-by-tick level are chunky.
- Plotting price movement instead of price shows that almost all price movements are 0¢, 1¢, or 2¢.
- When price movements are discreet, this requires specialized statistical methods.
- Example: Number of doctor office visits is a more reliable measure than health care spend.
- Drilling home: Stock price movement for caterpillar is not continuous.
Non-Constant Volume
- Issue: The volume involved in transactions is not constant.
- Impact: Cannot be ignored at the high frequency level.
- Pattern: Heavier activity at the opening and close. Thinner activity at lunchtime.
- Activity tends to get thinner at lunchtime. Even now, basic human needs to feed ourselves take priority over financial trading.
- Not all rows of data are equally impacted. A hundred shares trading hands is not the same as 8,000 shares.
Multiple Transactions within Single Second Windows
- Transactions can occur at different prices within the same second.
- Implication: Drilling down to the millisecond is becoming increasingly common.
- Computational Challenges: Millisecond data increases data size, making it difficult to handle and email. Would require data to be loaded to a server.
- Time Dimension: The definition of time becomes elastic, making the data difficult to work with.
Statistical Mirages
- Quirks can lead to statistical mirages.
- High frequency data requires special care.
Non-Synchronous Trading
- The key detail on what causes statistical mirages.
- Example: Stocks A and B are not related. A trades more frequently than B. News arrives at the end of the day. A might be affected today, whereas B waits till tomorrow. Statistically, A seems to lead B.
- Definition: A and B are not synchronized. A reacts to news quicker.
- Daily Data: This is not a concern with daily data because all major stocks will trade.
- High Frequency: At the tick level, there are moments when a stock trades and moments when it doesn't.
- This creates all kinds of hiccups.
Calculating Return and Autocorrelation
- Issue: It is not even clear that the return is what we want.
- Caterpillar's return today does not correlate with its behavior yesterday or its behavior two days ago.
- Large negative correlations between transactions. If Caterpillar's price went up in the previous transaction, it is going down during the next two, or even three. There is a temptation to form a trading strategy on that.
- Problem: that what we see on the screen that leads us to create a trading strategy is a statistical mirage.
Mathematical Derivation of One Transaction Correlation
*The derivation can be written as follows:
Where
: The mean return throughout the day.
: The variance of the return that day.
: The probability of the stock not trading during a period of time (non-synchronous trading).
- It is this negative attached to all this positive stuff that is producing this big spike.
- Non Synchronous Trading: Tesla is going to trade in a day, Starbucks is always going to trade in a day, so when a stock always trades, you don't have to worry about non synchronous trading.
- Artifact: The negative values are an artifact of non-synchronous trading, not an exploitable pattern.