August 15, 2010
When collaborative filtering goes wrong. Curiously, this was the only part of the concerto that made it to my Amazon recommendation list.

When collaborative filtering goes wrong. Curiously, this was the only part of the concerto that made it to my Amazon recommendation list.

8:00am  |   URL: http://tmblr.co/ZtlAMyv3Dj0
(View comments
Filed under: AI 
July 23, 2010
Sarah Connor Comes to Wall Street

So if you haven’t read it, Joseph Fuller’s The Terminator Comes to Wall Street is a well-written policy-advocating gloss. However, its technical merit is exactly what you’d expect from the PBK pimps whose most significant recent article involved an English professor nicknamed “Cockmaster D” advocating for “erotic intensity” between professor and student. I’ll stick to my problem sets, thanks.

So, anyway, the article is bifurcated nicely between well-meaning policy suggestions (“Executives should take more time to understand what’s going on”) and technical ignorance (“At the cutting edge of modeling science, researchers are trying to move away from relatively crude rules-based models toward models that approximate the processes of human reason.”) The crux of Fuller’s argument is that computers are bad — and they’re bad for three reasons. Briefly:

  1. Modelers don’t understand markets - especially the psychology of markets.
  2. Managers don’t understand modelers - and even if they did, computers can easily get too complicated for anyone to understand.
  3. Models don’t understand each other, especially how they interact with one another.

I’ll move through these in reverse order. Number three is an easy one — you could make an identical argument for human traders. Computers don’t introduce any kind of systemic risk that isn’t already there — we still have a multi-agent system with a bunch of profit-seekers interacting with one another. It’s an unfair standard to insist that computers have to have some kind of global, system-wide knowledge while people can trade ignorantly and unfettered.

Number two is a sound point about better management being needed, but Fuller loses me here:

The problem here goes beyond comprehension. Even if the executives were Quants, they might well not understand as much as they would like about the programs running their businesses. The models themselves—and particularly the interaction among models—has grown so complex that it may have become impossible for any human to fully grasp the types and volumes of derivatives traded in this way or to predict how the models will interact with each other.

This is a strange point to make, because if the models were simple enough to fully grasp, then there really wouldn’t be a point for them in the first place. Put another way, the whole reason you’re using computers operating off of complex relationships is to make something you can’t fully understand.

And Fuller’s first point, about computers not understanding “psychology”, is irrelevant. If real phenomena like trend-following or whatever exist meaningfully, they’ll be found and exploited. Instead, this line of reasoning smacks of an NBA announcer declaring a player who happens to hit a couple jump shots to be “on fire”. It’s people that see patterns and symbols when they’re not actually there, not computers that have trouble detecting regularities.

Ultimately, I think much of Fuller’s argument is based around misconceptions from Classic AI: computers as deterministic automata, blindly and unceasingly following some fixed source code, with human intelligence as the ultimate standard. It sounds like a pre-Moravec’s Paradox view of the world, where anything people find hard must be a “true test” for computers. In reality, it’s entirely appropriate to expect that when there are well-defined objectives and computers have the complete access to the policy state (no falling over!), computers will outperform people. Comparatively, making great trading bots is easy. Opening doors is hard.

8:00am  |   URL: http://tmblr.co/ZtlAMyodnbd
(View comments
Filed under: AI markets 
July 19, 2010
Should There Be a AAAI?

One of the (very valid) complaints about how I determined the best AI school was that I only looked at publications from AAAI/IJCAI. Each of the major subfields of AI have their own particular conferences that people will often prefer to send work to over AAAI. ML, probably the most dynamic and exciting part of today’s AI landscape, is better served by NIPS and ICML. Robotics, by ICRA. I’ve heard the case made that agent stuff is better at AAMAS.

So, what’s the point of AAAI? Arguments about fostering interdisciplinary work go out the window when there are a dozen parallel tracks. What we call AI is so vast and tenuous that interdisciplinary work might be a fool’s errand anyway. Furthermore, why attend a track that just has “pretty good” work in robotics?

The only thing I can think of in AAAI’s favor is that it’s important (for job searches, &c.) that there’s a recognizable AI conference that everyone knows is “good”. But if that ML paper was so important, why didn’t you publish it at NIPS?

7:14pm  |   URL: http://tmblr.co/ZtlAMynisVC
(View comments
Filed under: AI 
June 27, 2010
The Terminator Comes to Wall Street

10:38pm  |   URL: http://tmblr.co/ZtlAMyiMHZz
(View comments
Filed under: AI markets 
June 16, 2010
NYT Mag on IBM's Jeopardy AI

Better hurry and catch up, Cyc!

7:03pm  |   URL: http://tmblr.co/ZtlAMyg4vG2
(View comments
Filed under: AI 
June 13, 2010
Which is the Best AI School?

So in the recent US News rankings that everyone pretends not to care about, the “AI Specialty” rankings came out like this:

  • MIT
  • Carnegie Mellon
  • Stanford
  • Berkeley
  • Texas
  • Washington
  • Georgia Tech
  • UIUC/UMass (tie)
  • Penn

My understanding is that along with the numerical part of the survey, there’s a section in which department heads are supposed to write in names of schools that excel in specific specialties. Then, the number of times each school is mentioned is added and schools are ranked by how many times they appear. Obviously, this ranking methodology is crappy and so here I tried to do something better.

What I came up with was this:

  1. Identify the primary current research affiliation of the final author of every technical paper of the last several years of AAAI/IJCAI.
  2. Rank schools by the sum total of the counts of their affiliates.

This might seem quizzical, but it has some key advantages. It allows me to completely bypass the question of what AI is by answering it this way: AI is whatever gets published at AAAI/IJCAI. Trying to combine other, different conferences gets nasty quickly: comparing acceptance rates and “AI-ness” and numbers of papers and all that mess. Is EC an AI conference? CHI? ICRA? How “good” is UAI? My approach avoids these issues by focusing on the premier big tent conferences. And the ground they cover is really big — a list of AAAI 2010 keywords can be found on the second page of the call for papers.

Because there’s no good automatic way to do author affiliations at the time of publication, I focused on the current affiliation of the last author under the assumption that this would be the faculty member in charge of the lab responsible for the research (or the paper was single-authored by a grad student). In virtually all cases I looked up this turned out to be the case. There are cases where this methodology messes up, like this one, but I figure those are just tiny independent random shocks.

Before the rankings and analysis, a couple notes about my implementation. I looked at conferences from the past five years, so the AAAI programs from 2005, 2006, 2007, and 2008, and the IJCAI programs from 2005, 2007 and 2009. I didn’t count people who were the final author on only one publication. Mainly I just didn’t want to look up several hundred more names in Google, but also if you’ve only had one paper in the last seven major AI conferences you’re probably not making that much of an impact in AI anyway. Finally, just like US News I’m only looking at US schools; sorry, rest of the world. I’ll note in passing that several foreign schools are easily competitive with the schools listed here (Alberta and Southampton come to mind in particular).

So, with all that in mind, here’s the top ten:

  • Carnegie Mellon
  • UCLA/Washington (tie)
  • USC
  • MIT/UIUC/UMass (tie)
  • Texas
  • Harvard
  • Stanford

Rankings 2 through 7 were all within 2 papers and should be regarded as roughly equal, and Texas, Harvard, and Stanford were also within two papers of each other. The gap between CMU and the second-ranked schools was quite substantial, and CMU also had the most faculty members that were tallied as well. To qualitatively summarize the results in the most self-aggrandizing way possible:

  • Superior: Carnegie Mellon
  • Excellent: UCLA, Washington, USC, MIT, UIUC, UMass
  • Very Good: Texas, Harvard, Stanford

If there’s one way to summarize the differences between this ranking and the US News list, it’s that my ranking is more now. This is best seen by the three schools in my list but not the US News: UCLA, USC, and Harvard. All three of these schools have been making strong recent strides to hire good AI people, but certainly USC and Harvard are not “traditionally” considered computer science powerhouses, and that undoubtedly hurts them when it comes time for department heads to brainstorm AI schools.

Interestingly, by my tally no Berkeley researcher has had two publications in the last seven AAAI/IJCAI conferences. Obviously, there are fantastic AI researchers at Berkeley — Stuart Russell and Michael Jordan come to mind immediately — but this ranking shows they have not been particularly active in the recent big-tent conferences. If I had to guess, I would probably say faculty name recognition along with Russell writing the widely-used undergraduate AI textbook are responsible for Berkeley’s performance in the US News ranking.

8:05am  |   URL: http://tmblr.co/ZtlAMyfLyu0
(View comments
Filed under: AI 
June 4, 2010
Undergrad AI: Teach This, Not That

Undergrad AI should both give students an appreciation for AI as well as a glimpse of important research problems in the field. More and more, this means teaching the theory and practice of optimization. When IBM ads tell us they want a “smarter planet”, this is what they’re talking about.

So teach this:

  1. Machine Learning, through the Kernel Trick. Machine Learning is the most exciting part of AI right now, and students with an AI class under their belts should have exposure to things like clustering and SVMs.
  2. Mixed-Integer Programming. I don’t mean the principles behind how MIP solvers actually work, but students should know how to model a problems as MIPs and get a general sense of when MIPs work well. I think that MIPs really drive home a concept that’s central to AI, which is that the interesting problem you’re trying to solve is NP-hard but you can still do it a lot of the time.
  3. Local Search. A much more extensive introduction to local search should be explored. In particular, there should be a fuller discussion of different search algorithms, and a contrast drawn with MIPs about when using each is more appropriate.

and not that:

  1. Logic. First order logic, SAT, horn clauses, all of that garbage. Get rid of it. A friend of mine once said “SAT is the fourth-best way to solve any problem”.
  2. LISP. Functional programming is undeniably important and CS majors should certainly learn it at some point, but undergrad AI isn’t where they should be doing it. Cutting LISP fully away from the undergrad AI curriculum is an important step from disassociating the field from the bad old days of Strong AI nonsense.
  3. Genetic Algorithms. At both Harvard and CMU, a good amount of time was spent explaining these things, with only a cursory slide at the end of the presentation arguing that they might not be such a good idea. Genetic Algorithms deserve to lose their spot to more potent local search algorithms.

2:22pm  |   URL: http://tmblr.co/ZtlAMydaveH
(View comments
Filed under: AI 
May 29, 2010
Yesterday’s Shopping Bot of Tomorrow

There was a time a dozen years ago where there was a pretty clear path forward for what the Internet would enable (throw an “e-” on that bitch!). You’d want to buy a camera. But different features are worth different amounts to you. Zoom? Format? Maker? And you only want one camera. Or maybe two? Complicated! The obvious thing to do is just to incorporate your preferences into a bidding bot, who would then navigate the information superhighway on your behalf. Of course, your bidding bot would interact with the store’s selling bot, but also everyone else’s bidding bot. What would be the result of this interaction? What’s the correct way to structure this?

There is now an awfully extensive literature behind all of this. It was, I believe, the original “real world link” behind the presence of game theory in the multi-agent systems field, and the cause of the quaintly named ACM Conference on Electronic Commerce (EC). It’s also the reason my research group has the awful name Agent-Mediated Electronic Marketplaces Lab (hey, if we were all using shopping bots it would sound a lot better).

It’s ironic though that in the real world of Electronic Commerce, what’s succeeded is precisely the opposite. Instead of giving you a huge, expressive range of controls over items and prices, the cool, hip new services take a single thing for a single day and sells it for a single, really cheap price. Though Groupon is the hottest example, the real starter of the movement was Woot, which has been around since 2004. I remember when I first heard of it, the idea just seemed nuts. What kind of store sells only one thing, and changes that thing every day? Who would shop at a store that sold only a single external hard drive one day, and then only a single bluetooth headset the next day? Enough people to make a whole bunch of money, apparently.

So why have Wooty services succeeded and shopping bot services failed? I think it’s all symptomatic of one of our less visible challenges — that preference elicitation is incredibly hard. It’s of course hard computationally, with exponential state spaces, but the most challenging issue is that standard utility theory doesn’t do a great job of actually modeling the way people really behave. (You can either believe that the theory is defective, or that people are defective.) So even if you do manage to do it “right”, people will still get angry at you. NRMP Lawsuit, anyone? So what’s emerged from the Internet morass is the least eliciting mechanism possible: your preferences are just a single bit and the mechanism posts a price. Would a shopping bot model be more efficient? Yes, but only if your model is right — and we’re at the point now in “Electronic Commerce” where the failure of shopping bots to emerge argues that the model is wrong.

4:59am  |   URL: http://tmblr.co/ZtlAMycLpWe
(View comments
Filed under: AI 
May 18, 2010
Constantly Rebalanced Portfolios, Part Two

So last time I introduced the ideas behind CRPs. Now, why don’t they work in practice?

  1. They solve the wrong problem. One of the ideas central to CRP is that they are a long term strategy — that over time they will achieve sublinear regret with the best fixed strategy. But over long periods of time, the best fixed strategy becomes more and more meaningless relative to an adaptive adversary that can change its portfolio. Consider, for instance, investing in tech stocks in the ’90s, then cashing out, then investing in financial or oil companies in the ’00s, and then cashing out. Just because of the way the market is, the best fixed strategy in hindsight becomes less and less meaningful over the same long time spans over which CRPs are supposed to excel. Of course, CRPs do not achieve sublinear regret against the best adaptive strategy in hindsight — but this is, in a larger sense, the correct problem to solve.

  2. The bounds are meaningless. Let’s say we’ll use the Online Newton Step algorithm on the S&P 500, for a year. This algorithm achieves regret no larger than 5(1/a + GD)n log(T) where a, G, and D are pretty technical but take on values of about 1, T is the number of time steps (say 250 trading days in a year), and n is the dimension (here, 500). So, the bound is about 12,500. But this is an additive regret bound in log space. It seems reasonable to assume that even the best fixed CRP will not exceed a 55x return (which is e to the fourth power). Therefore, using this algorithm only proves that for each dollar you put into the algorithm, you will not end the period with fewer than exp(-12,496) dollars, which is close enough to zero to be zero. So just like in many other AI problems, the worst-case bound is meaningless here. Of course, this is not to say that such algorithms will work poorly in practice, but just to argue that there’s nothing magical about using one of these algorithms.

  3. The convex model is fundamentally flawed. Convex optimization requires a convex domain, in this case taken to be the simplex. But in the real world, nobody will let you put 100% of your money into a single name. There is, perhaps, a 10 or 20% maximum, and that’s if you have particularly willing investors. While the domain you are optimizing over is still convex, the optimal CRP in hindsight over the simplex could do much better than the optimal CRP over your restricted domain.

  4. They don’t take into account transaction costs. This is an issue that has been brought up before, notably by Blum and Kalai (warning, that link is a .ps.gz, ew). Essentially, introducing friction into your model slows down (and can even totally erase) your returns. Though this is the only issue that has been brought up in the literature, I think it’s actually the least important. As Cover alluded to in his article, by simply “caching” the moves you would have made and moving only when you have enough benefit to overcome transaction prices, you can accommodate these costs. Also, transaction costs just aren’t that significant anymore for hedge funds — my impression is that they would only slightly lower returns (say, in the 3-6% range).

  5. There’s no way to get the prices you see in hindsight. In the model, the CRP sees the daily return and then moves into the next day losslessly. But in practice, you don’t know the final prices until the end of the day, at which point you’re supposed to have already moved into your new position! This complication could be solved by moving throughout the day, or at the next day’s open, or what have you. Regardless, it’s a significant hit to the model.

  6. They don’t take into account market slippage. Even trading at the end of the day, in the auction, you’re likely to start moving prices if you put a lot of money into a single name. And of course, every time you move prices that hurts your return.

All of these factors combined conspired to kill Cover’s Mountain View Analytics. Still, I think the story is a wonderful illustration of the chasm between interesting, mathematically satisfying theory and nasty, brutish practice.

6:49pm  |   URL: http://tmblr.co/ZtlAMyaRjWQ
(View comments
Filed under: AI markets 
May 16, 2010
Constantly Rebalanced Portfolios, Part One

A decade ago, the Stanford Report was excited about a new hedge fund, Mountain View Analytics, run by the well-pedigreed Thomas Cover. To wit:

Tom Cover has the next-best thing to a time machine: He has an algorithm — a computational procedure — that uses the past to predict the future. It works as well or better than hindsight, outperforming a pretty good investment strategy: diversifying your stock portfolio and hoping that performance of superstars will more than make up for money wasted on losers. [several paragraphs redacted] So who wants to be a millionaire?

Sometime earlier this year, the domain registration on mountainviewanalytics.com expired, the page having not been updated for many years. These posts are intended to answer why constantly rebalanced portfolios (CRPs) — the broad class of investment strategies of which Cover’s is one — don’t actually work. In this post, I’ll give the technical and theoretical background, and in the next post, I’ll talk about the myriad of ways CRPs come up short in practice.

Why bother? Well, I think that CRPs are an interesting problem for two reasons:

  1. They provide a very illuminating example of the issues involved in bringing academic theory into practice, and the risks of hubris inherent in that process.

  2. They involve some very interesting math. They are featured at the end of the book Prediction, Learning, and Games (which I have considered buying if only for its cool cover art) which suggests they can be thought of as representing a culmination of learning theory, one of the more interesting topics I’ve had the opportunity to study in grad school.

The idea behind a CRP is that you always maintain a constant portfolio (distribution) of your wealth regardless of how the underlying assets in the portfolio change in value. So, if you’re mixing fifty-fifty between two stocks and one of them doubles in value while the other stays the same, you’d sell a quarter of your expensive stock holdings and use the proceeds to buy the cheaper stock — this maintains your holdings at an even split of wealth. Ideally, this scheme allows you to capture the value from positive swings in price while priming your portfolio to be ready for when undervalued assets become more in line. I probably should know a real answer to this, but my guess would be that if all the movements of the assets in your portfolio are completely arbitrary and not reflective of any kind of underlying value, then a CRP is your optimal policy.

Now, okay, let’s get into some math (sorry for the formatting, tumblr is nice for many things but not for this). Let x be a CRP (row vector of non-negative values, without loss of generality have it sum to 1 by introducing a “cash” asset if you’d like). Let aN represent a column vector of the return of the assets of the N-th day. (So if they were stocks, it would be a column vector that looks like [.97, 1.04, 1.01, …]).

Then we see:

return from best CRP in hindsight= max over x: (x * a1)(x * a2)(x * a3)*…

Now consider the RHS. By monotonicity of log, the same x will also solve:

max over x: log(x * a1) + log(x * a2) + log(x * a3) + … which is equivalent to

min over x: -log(x * a1) - log(x * a2) - log(x * a3) - …

So this is the insight transforms the task into an online convex optimization problem: minimizing a convex function (the sum of all those negative logs), over a convex domain (the simplex), which changes in an online fashion (you’re not statically optimizing, instead you get a new update every day).

If this sounds cool to you, you’ll probably want to check out Elad Hazan’s recent paper summarizing online convex optimization in this context. But essentially, there are now a large number of algorithms which stay competitive with the best CRP in hindsight. What’s neat about these algorithms is that, to achieve these bounds, you generally act to optimize over what you’ve currently seen, performing gradient-descent type algorithms over your historical data set. While gradient descent makes lots of sense in offline convex optimization, It’s very neat to me that such algorithms are successful in the online setting, where the shape of the space you’re optimizing over can change so dramatically from time step to time step, because assets could plummet or soar in value and you have no control over that.

So, I hope I’ve given a good illustration of how this should work in theory. Next time: why doesn’t it work in practice.

8:44pm  |   URL: http://tmblr.co/ZtlAMya551q
(View comments
Filed under: AI markets