Need help in reverse engineering FICO algorithms

Bears-PurdueFan · ‎07-06-2013

Hi folks,

I'm new to this board as you can see. A Google search got me here and boy did I learn a lot from you guys. The contributors to this site are phenomenal. One of my interests is to create "self thinking" algorithms. Risk assessment is a solid start on decision making. Creditors are pioneers when it comes to quantifying one's behavior, so I decided to start my endeavor using credit scoring algorithms. I was wondering anyone here have any sort of commercial mathematical proof or source code of any sort of financial sorting/ranking algorithm? Also I'm currently working on a simple Java application which would allow a person to submit their credit report in PDF format, the app would parse their report and create a near FICO score (once it is deemed robust, of course, it will be implemented on a website and free for all). Beauty of this is, all you have to do is get your free annual credit report from all 3 bureaus and the app in theory, can give you a near FICO score without having to pay for fake scores. I have been trying to replicate people's scores based on their report using a Matlab script I wrote, but still running into issues.. Anyone with a background in Computer Science and/or Mathematics who are well versed in credit worthiness, please let me know. Thank you.

cashnocredit · ‎07-06-2013

FICO scores are proprietary and highly valuable IP.

That means you will not be able to accuratelty reverse engineer them. because to do so would require a very large sample set together with FICO scores. Buying FICO scores over a randomly selected set of consumers can be done but.

1. It is expensive.

2. It undoubtlably includes a contractual limitation on reverse engineering.

3. If you violate that expect a restraining order and lawsuits you can;t afford.

Good luck. You'll need it.

I have reestablished credit over the last couple years
so my moniker is, well, rather out of date.

WM Discover $1800, WF Plat 12k, Chase Freedom Siggy18k, Amex Plat (60k H/B), Citi AA EWMC 25k

Bears-PurdueFan · ‎07-06-2013

I forgot to include what I need to know for my Matlab estimations:

I need to know if a BK in a credit profile is a variable or a constant in a FICO algorithm. As I understand a charge off is a variable, meaning over time, when it is still on your report it "means less".

Is a linear approximation what FICO uses to determine any sort of error being reported?

Is R^2 used to measure the quality of approximation?

Thanks!

Bears-PurdueFan · ‎07-06-2013

I see what you are saying, but in the field of computer science there is something called heuristics, it essentially is the study of "signatures" of a program. I'm looking for how a program flows. It is basically how a antivirus program works, it looks for signatures of a virus/malware, an anti-virus application doesn't need to know the nuts and bolts of a virus to detect it. I don't need to break any laws or step on ones intellectual property to discover how their program works. I'm pretty close, I'm just missing a few facts about FICO's algorithms. My main problem is to determine what are variables and what are constants in the FICO algorithms.

Bears-PurdueFan · ‎07-06-2013

sˆτ = 1 ∑τ (Xt − μˆτ )2 is what I am using to calculate standard deviation. I have a background in Computer Science and Mathematics, but no emphasis on Stats. Anyone with a stronger Stats background please chime in.

cashnocredit · ‎07-06-2013

Removed response. Somehow mangled.

I have reestablished credit over the last couple years
so my moniker is, well, rather out of date.

WM Discover $1800, WF Plat 12k, Chase Freedom Siggy18k, Amex Plat (60k H/B), Citi AA EWMC 25k

cashnocredit · ‎07-06-2013

FICO uses "scorecards" where they pregroup consumers in similar categories. Long history, short files, PRs and BKs, etc.

Unfortunately, they are less than forthcomming about how these groups are determined. Understandably because it would be a huge benefit for anyone reverse eingineering the algorithms.

There are a number of texts on credit scoring and you might start there. Also check out the work VantageScore has done using the new , fine grain data the CRA's have recently collected. The data makes it trivial to determine pay in full consumers from those carrying a balance. This was not avaialble until the last few years and estimates were error prone.

Keep in mind that logistical regression is a differenrt critter and conventional measures such as r^2 are almost worthless evaluating them.

I have reestablished credit over the last couple years
so my moniker is, well, rather out of date.

WM Discover $1800, WF Plat 12k, Chase Freedom Siggy18k, Amex Plat (60k H/B), Citi AA EWMC 25k

Bears-PurdueFan · ‎07-06-2013

Vantagescore?, interesting. As usual, good information from people on this board. Thank you. I'm new to all of this, I have credit cards, but never thought twice when I applied for them. Do you know of any place I can buy credit data without having to be a creditor?

Bears-PurdueFan · ‎07-06-2013

Only reason I use r^2 is because I was getting better results... Do you have a better approach to this? I checked with a old college buddy who is at Wolfram and he said I was going about it all wrong, like you are suggesting. Now I am thinking of using another approach, but my Stats knowledge is limited. Any suggestions?

llecs · ‎07-06-2013

Ditto to the scorecards. It's impossible to replicate because of the scoring buckets. My score is dependent on your credit and vice-versa. Everyone is scored on a bell curve and are lumped into 12-14 scoring buckets (depending on the FICO version). If everyone's credit tanks because of the economy (read into that: everyone), then the scores wouldn't suffer as if only your credit tanked and everyone else's was strong. That's why you see some people gain 70-100 points when a BK is removed and others only see 20-30, with others in between[ or less or more]. And of course you can apply point changes across anything like added accounts, added inquiries, length of credit history, etc.; it'll be different for everyone largely due to their scoring bucket.

Also FICO isn't one formula. There are dozens of FICO formulas and lenders pick and choose what works for them based on their customers, their demographic, their needs, etc. There are 3 on myFICO alone. Taking a BK as an example, you can have one drop and see dozens of different score changes depending on how many FICO scores you are looking at (e.g. from myFICO, from industry-specific score-based lenders like auto loans, etc.).