Intel Demos Single Chip with 48 Cores

scandal · December 3, 2009

Here's what turns my crank: it's a working 'algorithmic trader in a box' :
a single 32 core Nehalem system with LDMA feed handlers connected to the full suite of US equities and options exchanges (CTA/UTP/OPRA), market access gateways, and 8 cores to spare for running trading algos.

Have you seen this?

Yup. And http://www.activfinancial.com and http://www.quanthouse.com ....

mawilson · December 3, 2009

Here's what turns my crank: it's a working 'algorithmic trader in a box' :
a single 32 core Nehalem system with LDMA feed handlers connected to the full suite of US equities and options exchanges (CTA/UTP/OPRA), market access gateways, and 8 cores to spare for running trading algos.

Have you seen this?

Yup. And http://www.activfinancial.com and http://www.quanthouse.com ....

Not the same thing. These guys sell hardware-accelerated cards that process raw feeds, not some in-house ticker plant solution.

Niels Bohr · December 3, 2009

That's what they meant by low-latency trading. The key is really in implementing atomic instruction set. Different from OS development where blocking synchronization is needed if resources are shared. Thanks.

I looked at the Celoxica set and their top-level diagram. They have one FPGA CPU dedicated to process the network protocol which saves the second CPU from dealing with handling the network protocol, therefore, increasing low-latency. Pretty basic. I thought it was powerful.

But, this open up my eyes to high speed trading. Gold mine, here I come.

Looks like this is all new to me. I never did any projects related to trading. Didn't know there something like DWCAS.

You can start here. With the steady growth of multi-core systems, lock-free algorithms that can scale

with the availability of new cores are becoming increasingly more important.

Edited December 3, 2009 by Niels Bohr

mawilson · December 4, 2009

That's what they meant by low-latency trading. The key is really in implementing atomic instruction set. Different from OS development where blocking synchronization is needed if resources are shared. Thanks.

Sort of.

Latency is defined as the time spent between the hardware receiving an external

event and the software (application thread) responding to it.

A lot of things happen in between:

1) An external event is signalled via an interrupt. The hardware adapter receives

the interrupt and passes the interrupt to the CPU by raising the CPU's interrupt pin.

Interrupts can be bound or routed to specific CPUs (look up "I/O APIC").

2) The CPU passes the interrupt to the OS. The OS runs the interrupt handler, likely

stealing the context of the running thread (i.e. the thread currently running on the

interrupted CPU, if any)

3) An application thread waiting for the event (e.g. a data packet via select() or another

blocking system call) is put on the run queue. Depending on its priority and scheduling

class, it could preempt another running thread, forcing a context switch.

4) The application thread runs, potentially causing a series of pages faults to page its

working set in, and responds.

All these factors - thread scheduling, context switching, paging, CPUs processing

other interrupts, etc. - contribute to latency.

scandal · December 4, 2009

Here's what turns my crank: it's a working 'algorithmic trader in a box' :
a single 32 core Nehalem system with LDMA feed handlers connected to the full suite of US equities and options exchanges (CTA/UTP/OPRA), market access gateways, and 8 cores to spare for running trading algos.

Have you seen this?

Yup. And http://www.activfinancial.com and http://www.quanthouse.com ....

Not the same thing. These guys sell hardware-accelerated cards that process raw feeds, not some in-house ticker plant solution.

Actually, Activ has a hardware(FPGA) based approach (ActivFeed MPU) which competes directly with Celoxia.

There are pros and cons to building feed handlers in software vs. hardware and there will probably be a market ecology for both approaches for years into the future.

Hardware can be faster of course, but is less flexible. Whenever the upstream exchange changes its spec you have to update your feed handler. For software, this is just a binary patch - easy to put in and easy to back out if something goes wrong. With hardware you need to flash your chipset with the update. Harder to do, harder to debug issues/configuration problems, and much harder to back out. This is particularly a problem if you're interfacing to many exchanges that change their feed spec frequently (e.g. CME, LIFFE, ICE...).

Which brings up the issue of exchange support. Most of the hardware guys seem to be concentrating on equity exchanges. All fine and good, but if you trade on various ECNs and offshore future and options markets, and maybe need a Reuters or a Bloomberg feed ... good luck getting hardware support for all of these. Software vendors tend to have broader exchange support for these kind of diverse exchange connectivity needs. The hardware guys also work better at servicing protocol based feeds (FIX/FAST, ITCH). They don't do as well as the software guys for session oriented API based feeds.

Finally, I think the hw vs. sw question has a lot to do with your trading environment. On the one hand, if you're supporting a trading floor with a few hundred (or more) screen based traders then the latency difference between a software vs. hardware feed handler isn't going to matter. You'll wind up distributing it through some kind of market data infrastructure (e.g. RMDS or 29West ). The hop latencies for those application to application context switches, even in a fast LAN (1Gig or 10Gig) environment will make a software based feed handler acceptable for most needs.

On the other hand, if you're doing an embedded black box trader which is sitting colocated at an exchange on a fast cross connect, and you're worrying about shared memory transfers from the feed to the trading algo - then yes, I would agree that a hardware based feed handler is probably appropriate and worth the hassle.

That's what they meant by low-latency trading. The key is really in implementing atomic instruction set. Different from OS development where blocking synchronization is needed if resources are shared. Thanks.

Sort of.

Latency is defined as the time spent between the hardware receiving an external

event and the software (application thread) responding to it.

A lot of things happen in between:

1) An external event is signalled via an interrupt. The hardware adapter receives

the interrupt and passes the interrupt to the CPU by raising the CPU's interrupt pin.

Interrupts can be bound or routed to specific CPUs (look up "I/O APIC").

2) The CPU passes the interrupt to the OS. The OS runs the interrupt handler, likely

stealing the context of the running thread (i.e. the thread currently running on the

interrupted CPU, if any)

3) An application thread waiting for the event (e.g. a data packet via select() or another

blocking system call) is put on the run queue. Depending on its priority and scheduling

class, it could preempt another running thread, forcing a context switch.

4) The application thread runs, potentially causing a series of pages faults to page its

working set in, and responds.

All these factors - thread scheduling, context switching, paging, CPUs processing

other interrupts, etc. - contribute to latency.

This is all true. However it assumes that the application is on the same box with the external event, and that there is one "application". In typical market data settings there are many trading applications all consuming market data and making trading decisions based upon it. The applications are spread out throughout a corporate LAN (or WAN) environment - most likely a collection of server based (headless) processes as well as desktop based GUI traders. In this setting the overall latency is a combination of external latency (time from quote source to your front door - i.e. feed handler), time from feed handler to traverse internal market data platform, and time for the consuming application to receive and respond.

Also, for a time-sensitive trading strategy it's not only the inbound tick rate that counts. It's also the outbound order execution. If you're aggressing an order you only win the race if you're first back to the matching engine.

Sign In

Intel Demos Single Chip with 48 Cores

20 posts in this topic

Recommended Posts

scandal 951

mawilson 2,133

Niels Bohr 72

mawilson 2,133

scandal 951

Create an account or sign in to comment

Create an account

Sign in

Guides

US Visas

Office Reviews & Info

Ask a Pro

Activity