%md # Insurance Claims - Fraud Detection **Business case:** Insurance fraud is a huge problem in the industry. It's difficult to identify fraud claims. IHS is in a unique position to help the Auto Insurance industry with this problem. **Problem Statement:** Data is stored in different systems and its difficult to build analytics using multiple data sources. Copying data into a single platform is time consuming. **Business solution:** Use S3 as a data lake to store different sources of data in a single platform. This allows data scientists / analysis to quickly analyze the data and generate reports to predict market trends and/or make financial decisions. **Technical Solution:** Use Databricks as a single platform to pull various sources of data from API endpoints, or batch dumps into S3 for further processing. ETL the CSV datasets into efficient Parquet formats for performant processing.
%md In this example, we will be working with some auto insurance data to demonstrate how we can create a predictive model that predicts if an insurance claim is fraudulent or not. This will be a Binary Classification task, and we will be creating a Decision Tree model. With the prediction data, we are able to estimate what our total predicted fradulent claim amount is like, and zoom into various features such as a breakdown of predicted fraud count by insured hobbies - our model's best predictor. We will cover the following steps to illustrate how we build a Machine Learning Pipeline: * Data Import * Data Exploration * Data Processing * Create Decision Tree Model * Measuring Error Rate * Model Tuning * Zooming in on Prediction Data
In this example, we will be working with some auto insurance data to demonstrate how we can create a predictive model that predicts if an insurance claim is fraudulent or not. This will be a Binary Classification task, and we will be creating a Decision Tree model.
With the prediction data, we are able to estimate what our total predicted fradulent claim amount is like, and zoom into various features such as a breakdown of predicted fraud count by insured hobbies - our model's best predictor.
We will cover the following steps to illustrate how we build a Machine Learning Pipeline:
- Data Import
- Data Exploration
- Data Processing
- Create Decision Tree Model
- Measuring Error Rate
- Model Tuning
- Zooming in on Prediction Data
Last refresh: Never
%md ## Data Import First download [this csv file](https://raw.githubusercontent.com/jodb/sparkTestData/master/insurance_claims.csv) locally The data used in this example was from a CSV file that was imported using the Tables UI. [Databases and Tables](https://docs.databricks.com/user-guide/tables.html) After uploading the data using the UI, we can run SparkSQL queries against the table, or create a DataFrame from the table. In this example, we will create a Spark DataFrame.
Data Import
First download this csv file locally
The data used in this example was from a CSV file that was imported using the Tables UI. Databases and Tables
After uploading the data using the UI, we can run SparkSQL queries against the table, or create a DataFrame from the table.
In this example, we will create a Spark DataFrame.
Last refresh: Never
# data = spark.table("insurance_claims") fileStorePath = "/FileStore/tables/dgshzbe11508270992746/insurance_claims.csv" data = spark.read.format("csv")\ .options(inferSchema="true", header="true")\ .load(fileStorePath)\ .drop("_c39") df = data.withColumn("policy_bind_date", data.policy_bind_date.cast("string"))\ .withColumn("incident_date", data.incident_date.cast("string"))
# Preview data display(df)
328 | 48 | 521585 | 2014-10-17 00:00:00 | OH | 250/500 | 1000 | 1406.91 | 0 | 466132 | MALE | MD | craft-repair | sleeping | husband | 53300 | 0 | 2015-01-25 00:00:00 | Single Vehicle Collision | Side Collision | Major Damage | Police | SC | Columbus | 9935 4th Drive | 5 | 1 | YES | 1 | 2 | YES | 71610 | 6510 | 13020 | 52080 | Saab | 92x | 2004 | Y |
228 | 42 | 342868 | 2006-06-27 00:00:00 | IN | 250/500 | 2000 | 1197.22 | 5000000 | 468176 | MALE | MD | machine-op-inspct | reading | other-relative | 0 | 0 | 2015-01-21 00:00:00 | Vehicle Theft | ? | Minor Damage | Police | VA | Riverwood | 6608 MLK Hwy | 8 | 1 | ? | 0 | 0 | ? | 5070 | 780 | 780 | 3510 | Mercedes | E400 | 2007 | Y |
134 | 29 | 687698 | 2000-09-06 00:00:00 | OH | 100/300 | 2000 | 1413.14 | 5000000 | 430632 | FEMALE | PhD | sales | board-games | own-child | 35100 | 0 | 2015-02-22 00:00:00 | Multi-vehicle Collision | Rear Collision | Minor Damage | Police | NY | Columbus | 7121 Francis Lane | 7 | 3 | NO | 2 | 3 | NO | 34650 | 7700 | 3850 | 23100 | Dodge | RAM | 2007 | N |
256 | 41 | 227811 | 1990-05-25 00:00:00 | IL | 250/500 | 2000 | 1415.74 | 6000000 | 608117 | FEMALE | PhD | armed-forces | board-games | unmarried | 48900 | -62400 | 2015-01-10 00:00:00 | Single Vehicle Collision | Front Collision | Major Damage | Police | OH | Arlington | 6956 Maple Drive | 5 | 1 | ? | 1 | 2 | NO | 63400 | 6340 | 6340 | 50720 | Chevrolet | Tahoe | 2014 | Y |
228 | 44 | 367455 | 2014-06-06 00:00:00 | IL | 500/1000 | 1000 | 1583.91 | 6000000 | 610706 | MALE | Associate | sales | board-games | unmarried | 66000 | -46000 | 2015-02-17 00:00:00 | Vehicle Theft | ? | Minor Damage | None | NY | Arlington | 3041 3rd Ave | 20 | 1 | NO | 0 | 1 | NO | 6500 | 1300 | 650 | 4550 | Accura | RSX | 2009 | N |
256 | 39 | 104594 | 2006-10-12 00:00:00 | OH | 250/500 | 1000 | 1351.1 | 0 | 478456 | FEMALE | PhD | tech-support | bungie-jumping | unmarried | 0 | 0 | 2015-01-02 00:00:00 | Multi-vehicle Collision | Rear Collision | Major Damage | Fire | SC | Arlington | 8973 Washington St | 19 | 3 | NO | 0 | 2 | NO | 64100 | 6410 | 6410 | 51280 | Saab | 95 | 2003 | Y |
137 | 34 | 413978 | 2000-06-04 00:00:00 | IN | 250/500 | 1000 | 1333.35 | 0 | 441716 | MALE | PhD | prof-specialty | board-games | husband | 0 | -77000 | 2015-01-13 00:00:00 | Multi-vehicle Collision | Front Collision | Minor Damage | Police | NY | Springfield | 5846 Weaver Drive | 0 | 3 | ? | 0 | 0 | ? | 78650 | 21450 | 7150 | 50050 | Nissan | Pathfinder | 2012 | N |
165 | 37 | 429027 | 1990-02-03 00:00:00 | IL | 100/300 | 1000 | 1137.03 | 0 | 603195 | MALE | Associate | tech-support | base-jumping | unmarried | 0 | 0 | 2015-02-27 00:00:00 | Multi-vehicle Collision | Front Collision | Total Loss | Police | VA | Columbus | 3525 3rd Hwy | 23 | 3 | ? | 2 | 2 | YES | 51590 | 9380 | 9380 | 32830 | Audi | A5 | 2015 | N |
27 | 33 | 485665 | 1997-02-05 00:00:00 | IL | 100/300 | 500 | 1442.99 | 0 | 601734 | FEMALE | PhD | other-service | golf | own-child | 0 | 0 | 2015-01-30 00:00:00 | Single Vehicle Collision | Front Collision | Total Loss | Police | WV | Arlington | 4872 Rock Ridge | 21 | 1 | NO | 1 | 1 | YES | 27700 | 2770 | 2770 | 22160 | Toyota | Camry | 2012 | N |
212 | 42 | 636550 | 2011-07-25 00:00:00 | IL | 100/300 | 500 | 1315.68 | 0 | 600983 | MALE | PhD | priv-house-serv | camping | wife | 0 | -39300 | 2015-01-05 00:00:00 | Single Vehicle Collision | Rear Collision | Total Loss | Other | NC | Hillsdale | 3066 Francis Ave | 14 | 1 | NO | 2 | 1 | ? | 42300 | 4700 | 4700 | 32900 | Saab | 92x | 1996 | N |
235 | 42 | 543610 | 2002-05-26 00:00:00 | OH | 100/300 | 500 | 1253.12 | 4000000 | 462283 | FEMALE | Masters | exec-managerial | dancing | other-relative | 38400 | 0 | 2015-01-06 00:00:00 | Single Vehicle Collision | Front Collision | Total Loss | Police | NY | Northbend | 1558 1st Ridge | 22 | 1 | YES | 2 | 2 | ? | 87010 | 7910 | 15820 | 63280 | Ford | F150 | 2002 | N |
447 | 61 | 214618 | 1999-05-29 00:00:00 | OH | 100/300 | 2000 | 1137.16 | 0 | 615561 | FEMALE | High School | exec-managerial | skydiving | other-relative | 0 | -51000 | 2015-02-15 00:00:00 | Multi-vehicle Collision | Front Collision | Major Damage | Fire | SC | Springfield | 5971 5th Hwy | 21 | 3 | YES | 1 | 2 | YES | 114920 | 17680 | 17680 | 79560 | Audi | A3 | 2006 | N |
60 | 23 | 842643 | 1997-11-20 00:00:00 | OH | 500/1000 | 500 | 1215.36 | 3000000 | 432220 | MALE | MD | protective-serv | reading | wife | 0 | 0 | 2015-01-22 00:00:00 | Single Vehicle Collision | Rear Collision | Total Loss | Ambulance | SC | Northbend | 6655 5th Drive | 9 | 1 | YES | 1 | 0 | NO | 56520 | 4710 | 9420 | 42390 | Saab | 95 | 2000 | N |
121 | 34 | 626808 | 2012-10-26 00:00:00 | OH | 100/300 | 1000 | 936.61 | 0 | 464652 | FEMALE | MD | armed-forces | bungie-jumping | wife | 52800 | -32800 | 2015-01-08 00:00:00 | Parked Car | ? | Minor Damage | None | SC | Springfield | 6582 Elm Lane | 5 | 1 | NO | 1 | 1 | NO | 7280 | 1120 | 1120 | 5040 | Toyota | Highlander | 2010 | N |
180 | 38 | 644081 | 1998-12-28 00:00:00 | OH | 250/500 | 2000 | 1301.13 | 0 | 476685 | FEMALE | College | machine-op-inspct | board-games | not-in-family | 41300 | -55500 | 2015-01-15 00:00:00 | Single Vehicle Collision | Rear Collision | Total Loss | Police | SC | Springfield | 6851 3rd Drive | 12 | 1 | NO | 0 | 2 | YES | 46200 | 4200 | 8400 | 33600 | Dodge | Neon | 2003 | Y |
473 | 58 | 892874 | 1992-10-19 00:00:00 | IN | 100/300 | 2000 | 1131.4 | 0 | 458733 | FEMALE | MD | transport-moving | movies | other-relative | 55700 | 0 | 2015-01-29 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Other | WV | Hillsdale | 9573 Weaver Ave | 12 | 4 | YES | 0 | 0 | NO | 63120 | 10520 | 10520 | 42080 | Accura | MDX | 1999 | Y |
70 | 26 | 558938 | 2005-06-08 00:00:00 | OH | 500/1000 | 1000 | 1199.44 | 5000000 | 619884 | MALE | College | machine-op-inspct | hiking | own-child | 63600 | 0 | 2015-02-22 00:00:00 | Multi-vehicle Collision | Rear Collision | Major Damage | Other | NY | Riverwood | 5074 3rd St | 0 | 3 | ? | 1 | 2 | YES | 52110 | 5790 | 5790 | 40530 | Nissan | Maxima | 2012 | N |
140 | 31 | 275265 | 2004-11-15 00:00:00 | IN | 500/1000 | 500 | 708.64 | 6000000 | 470610 | MALE | High School | machine-op-inspct | reading | unmarried | 53500 | 0 | 2015-01-06 00:00:00 | Single Vehicle Collision | Side Collision | Total Loss | Police | WV | Northbend | 4546 Tree St | 9 | 1 | NO | 0 | 2 | YES | 77880 | 14160 | 7080 | 56640 | Suburu | Legacy | 2015 | N |
160 | 37 | 921202 | 2014-12-28 00:00:00 | OH | 500/1000 | 500 | 1374.22 | 0 | 472135 | FEMALE | MD | craft-repair | yachting | other-relative | 45500 | -37800 | 2015-01-19 00:00:00 | Single Vehicle Collision | Side Collision | Total Loss | Other | NY | Northbrook | 3842 Solo Ridge | 19 | 1 | YES | 1 | 0 | NO | 72930 | 6630 | 13260 | 53040 | Accura | TL | 2015 | N |
196 | 39 | 143972 | 1992-08-02 00:00:00 | IN | 500/1000 | 2000 | 1475.73 | 0 | 477670 | FEMALE | High School | handlers-cleaners | camping | own-child | 57000 | -27300 | 2015-02-22 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Police | VA | Columbus | 8101 3rd Ridge | 8 | 3 | ? | 2 | 0 | NO | 60400 | 6040 | 6040 | 48320 | Nissan | Pathfinder | 2014 | N |
460 | 62 | 183430 | 2002-06-25 00:00:00 | IN | 250/500 | 1000 | 1187.96 | 4000000 | 618845 | MALE | JD | other-service | bungie-jumping | own-child | 0 | 0 | 2015-01-01 00:00:00 | Multi-vehicle Collision | Rear Collision | Minor Damage | Police | NY | Columbus | 5380 Pine St | 20 | 3 | NO | 1 | 0 | ? | 47160 | 0 | 5240 | 41920 | Suburu | Impreza | 2011 | N |
217 | 41 | 431876 | 2005-11-27 00:00:00 | IL | 500/1000 | 2000 | 875.15 | 0 | 442479 | FEMALE | Associate | machine-op-inspct | skydiving | own-child | 46700 | 0 | 2015-02-10 00:00:00 | Multi-vehicle Collision | Side Collision | Total Loss | Police | SC | Arlington | 8957 Weaver Drive | 15 | 3 | ? | 1 | 2 | ? | 37840 | 0 | 4730 | 33110 | Accura | RSX | 1996 | N |
370 | 55 | 285496 | 1994-05-27 00:00:00 | IL | 100/300 | 2000 | 972.18 | 0 | 443920 | MALE | High School | prof-specialty | paintball | other-relative | 72700 | -68200 | 2015-01-11 00:00:00 | Multi-vehicle Collision | Rear Collision | Major Damage | Ambulance | SC | Hillsdale | 2526 Embaracadero Ave | 20 | 3 | NO | 0 | 0 | YES | 71520 | 17880 | 5960 | 47680 | Suburu | Forrestor | 2000 | Y |
413 | 55 | 115399 | 1991-02-08 00:00:00 | IN | 100/300 | 2000 | 1268.79 | 0 | 453148 | MALE | MD | priv-house-serv | chess | own-child | 0 | -31000 | 2015-01-19 00:00:00 | Single Vehicle Collision | Front Collision | Total Loss | Ambulance | WV | Northbend | 5667 4th Drive | 15 | 1 | ? | 2 | 2 | ? | 98160 | 8180 | 16360 | 73620 | Dodge | RAM | 2011 | Y |
237 | 40 | 736882 | 1996-02-02 00:00:00 | IN | 100/300 | 1000 | 883.31 | 0 | 434733 | MALE | College | craft-repair | kayaking | husband | 0 | -53500 | 2015-02-24 00:00:00 | Single Vehicle Collision | Rear Collision | Minor Damage | Other | VA | Riverwood | 2502 Apache Hwy | 6 | 1 | NO | 1 | 3 | NO | 77880 | 7080 | 14160 | 56640 | Ford | Escape | 2005 | N |
8 | 35 | 699044 | 2013-12-05 00:00:00 | OH | 100/300 | 2000 | 1266.92 | 0 | 613982 | MALE | Masters | sales | polo | own-child | 0 | 0 | 2015-01-09 00:00:00 | Multi-vehicle Collision | Rear Collision | Major Damage | Other | OH | Arlington | 3418 Texas Lane | 16 | 3 | NO | 1 | 3 | YES | 71500 | 16500 | 11000 | 44000 | Ford | Escape | 2006 | Y |
257 | 43 | 863236 | 1990-09-20 00:00:00 | IN | 100/300 | 2000 | 1322.1 | 0 | 436984 | MALE | High School | prof-specialty | golf | own-child | 0 | -29200 | 2015-01-28 00:00:00 | Parked Car | ? | Minor Damage | Police | PA | Arlington | 2533 Elm St | 4 | 1 | YES | 1 | 3 | YES | 9020 | 1640 | 820 | 6560 | Toyota | Camry | 2005 | N |
202 | 34 | 608513 | 2002-07-18 00:00:00 | IN | 100/300 | 500 | 848.07 | 3000000 | 607730 | MALE | JD | exec-managerial | chess | not-in-family | 31000 | -30200 | 2015-01-07 00:00:00 | Vehicle Theft | ? | Minor Damage | None | VA | Northbrook | 3790 Andromedia Hwy | 5 | 1 | YES | 2 | 1 | ? | 5720 | 1040 | 520 | 4160 | Suburu | Forrestor | 2003 | Y |
224 | 40 | 914088 | 1990-02-08 00:00:00 | OH | 100/300 | 2000 | 1291.7 | 0 | 609837 | FEMALE | JD | sales | kayaking | not-in-family | 0 | -55600 | 2015-01-08 00:00:00 | Single Vehicle Collision | Side Collision | Minor Damage | Other | SC | Northbend | 3220 Rock Drive | 21 | 1 | NO | 1 | 0 | YES | 69840 | 7760 | 15520 | 46560 | Dodge | Neon | 2009 | N |
241 | 45 | 596785 | 2014-03-04 00:00:00 | IL | 500/1000 | 2000 | 1104.5 | 0 | 432211 | FEMALE | PhD | machine-op-inspct | basketball | unmarried | 0 | 0 | 2015-02-15 00:00:00 | Single Vehicle Collision | Rear Collision | Minor Damage | Police | SC | Northbrook | 2100 Francis Drive | 5 | 1 | NO | 2 | 2 | NO | 91650 | 14100 | 14100 | 63450 | Accura | TL | 2011 | N |
64 | 25 | 908616 | 2000-02-18 00:00:00 | IL | 250/500 | 1000 | 954.16 | 0 | 473328 | MALE | Masters | prof-specialty | video-games | husband | 53200 | 0 | 2015-01-18 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Ambulance | SC | Columbus | 4687 5th Drive | 22 | 4 | NO | 0 | 0 | ? | 75600 | 12600 | 12600 | 50400 | Toyota | Corolla | 2005 | N |
166 | 37 | 666333 | 2008-06-19 00:00:00 | IL | 100/300 | 2000 | 1337.28 | 8000000 | 610393 | MALE | JD | craft-repair | reading | husband | 27500 | 0 | 2015-02-28 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Police | WV | Riverwood | 9038 2nd Lane | 10 | 3 | NO | 2 | 2 | ? | 67140 | 7460 | 7460 | 52220 | Ford | F150 | 2006 | Y |
155 | 35 | 336614 | 2003-08-01 00:00:00 | IL | 500/1000 | 1000 | 1088.34 | 0 | 614780 | FEMALE | Associate | adm-clerical | yachting | other-relative | 81100 | 0 | 2015-02-24 00:00:00 | Multi-vehicle Collision | Front Collision | Total Loss | Police | NY | Arlington | 6092 5th Ave | 16 | 3 | YES | 2 | 3 | NO | 29790 | 3310 | 3310 | 23170 | BMW | 3 Series | 2008 | N |
114 | 30 | 584859 | 1992-04-04 00:00:00 | IL | 100/300 | 1000 | 1558.29 | 0 | 472248 | MALE | High School | farming-fishing | video-games | wife | 51400 | -64000 | 2015-01-09 00:00:00 | Multi-vehicle Collision | Front Collision | Major Damage | Ambulance | NY | Hillsdale | 8353 Britain Ridge | 1 | 3 | NO | 1 | 2 | ? | 77110 | 14020 | 14020 | 49070 | Suburu | Impreza | 2015 | N |
149 | 37 | 990493 | 1991-01-13 00:00:00 | IL | 500/1000 | 500 | 1415.68 | 0 | 603381 | MALE | PhD | prof-specialty | yachting | own-child | 0 | 0 | 2015-02-12 00:00:00 | Single Vehicle Collision | Side Collision | Total Loss | Fire | WV | Hillsdale | 3540 Maple St | 17 | 1 | YES | 0 | 1 | YES | 64800 | 10800 | 5400 | 48600 | Audi | A3 | 1999 | N |
147 | 33 | 129872 | 2010-08-08 00:00:00 | OH | 100/300 | 1000 | 1334.15 | 6000000 | 479224 | MALE | High School | craft-repair | reading | not-in-family | 53300 | -49200 | 2015-01-24 00:00:00 | Single Vehicle Collision | Front Collision | Major Damage | Other | WV | Springfield | 3104 Sky Drive | 15 | 1 | YES | 2 | 0 | YES | 53100 | 10620 | 5310 | 37170 | Mercedes | C300 | 1995 | Y |
62 | 28 | 200152 | 2003-03-09 00:00:00 | IL | 100/300 | 1000 | 988.45 | 0 | 430141 | FEMALE | Masters | protective-serv | camping | unmarried | 0 | 0 | 2015-01-09 00:00:00 | Single Vehicle Collision | Rear Collision | Total Loss | Police | NY | Northbrook | 4981 Weaver St | 3 | 1 | ? | 1 | 1 | YES | 60200 | 6020 | 6020 | 48160 | Suburu | Forrestor | 2004 | Y |
289 | 49 | 933293 | 1993-02-03 00:00:00 | IL | 500/1000 | 2000 | 1222.48 | 0 | 620757 | FEMALE | JD | priv-house-serv | golf | unmarried | 0 | 0 | 2015-01-18 00:00:00 | Parked Car | ? | Minor Damage | None | WV | Arlington | 6676 Tree Lane | 16 | 1 | NO | 1 | 1 | YES | 5330 | 1230 | 820 | 3280 | Suburu | Legacy | 2001 | N |
431 | 54 | 485664 | 2002-11-25 00:00:00 | IN | 500/1000 | 2000 | 1155.55 | 0 | 615901 | FEMALE | MD | craft-repair | bungie-jumping | unmarried | 65700 | 0 | 2015-01-21 00:00:00 | Multi-vehicle Collision | Rear Collision | Major Damage | Police | NY | Hillsdale | 3930 Embaracadero St | 4 | 3 | ? | 2 | 0 | ? | 62300 | 12460 | 6230 | 43610 | Jeep | Wrangler | 2007 | N |
199 | 37 | 982871 | 1997-07-27 00:00:00 | IN | 250/500 | 500 | 1262.08 | 0 | 474615 | MALE | JD | tech-support | video-games | wife | 48500 | 0 | 2015-01-08 00:00:00 | Single Vehicle Collision | Front Collision | Major Damage | Ambulance | NC | Columbus | 3422 Flute St | 4 | 1 | ? | 0 | 3 | NO | 60170 | 10940 | 10940 | 38290 | Nissan | Pathfinder | 2011 | Y |
79 | 26 | 206213 | 1995-05-08 00:00:00 | IL | 100/300 | 500 | 1451.62 | 0 | 456446 | MALE | Associate | tech-support | kayaking | not-in-family | 0 | -55700 | 2015-01-03 00:00:00 | Single Vehicle Collision | Rear Collision | Minor Damage | Ambulance | WV | Columbus | 4862 Lincoln Hwy | 19 | 1 | NO | 2 | 2 | ? | 40000 | 8000 | 4000 | 28000 | BMW | M5 | 2010 | N |
116 | 34 | 616337 | 2012-08-30 00:00:00 | IN | 250/500 | 500 | 1737.66 | 0 | 470577 | MALE | Associate | transport-moving | chess | unmarried | 0 | -24100 | 2015-01-01 00:00:00 | Single Vehicle Collision | Side Collision | Major Damage | Police | WV | Northbrook | 5719 2nd Lane | 1 | 1 | ? | 1 | 1 | ? | 97080 | 16180 | 16180 | 64720 | BMW | X5 | 2001 | Y |
37 | 23 | 448961 | 2006-04-30 00:00:00 | IL | 500/1000 | 500 | 1475.93 | 0 | 441648 | FEMALE | College | prof-specialty | hiking | husband | 0 | -67400 | 2015-01-16 00:00:00 | Multi-vehicle Collision | Side Collision | Minor Damage | Other | SC | Springfield | 3221 Solo Ridge | 17 | 3 | YES | 1 | 0 | NO | 51660 | 5740 | 5740 | 40180 | Dodge | RAM | 2010 | N |
106 | 30 | 790442 | 2003-04-13 00:00:00 | OH | 250/500 | 500 | 538.17 | 0 | 433782 | FEMALE | PhD | transport-moving | reading | own-child | 49700 | -60200 | 2015-02-10 00:00:00 | Single Vehicle Collision | Rear Collision | Total Loss | Other | NC | Arlington | 6660 MLK Drive | 23 | 1 | NO | 2 | 2 | NO | 51120 | 5680 | 5680 | 39760 | Mercedes | E400 | 2005 | N |
269 | 44 | 108844 | 2007-12-05 00:00:00 | IL | 100/300 | 2000 | 1081.08 | 0 | 468104 | MALE | JD | priv-house-serv | reading | unmarried | 36400 | -28700 | 2015-02-14 00:00:00 | Single Vehicle Collision | Front Collision | Minor Damage | Other | SC | Springfield | 1699 Oak Drive | 14 | 1 | YES | 0 | 2 | ? | 56400 | 11280 | 11280 | 33840 | Toyota | Highlander | 2014 | N |
265 | 40 | 430029 | 2006-08-21 00:00:00 | IL | 250/500 | 1000 | 1454.43 | 0 | 459407 | FEMALE | MD | protective-serv | yachting | husband | 0 | 0 | 2015-02-21 00:00:00 | Multi-vehicle Collision | Rear Collision | Total Loss | Other | NY | Arlington | 4234 Cherokee Lane | 17 | 3 | NO | 2 | 3 | ? | 55120 | 6890 | 0 | 48230 | Accura | MDX | 2002 | N |
163 | 33 | 529112 | 1990-01-08 00:00:00 | IN | 100/300 | 500 | 1240.47 | 0 | 472573 | FEMALE | Associate | other-service | polo | husband | 35300 | 0 | 2015-02-18 00:00:00 | Multi-vehicle Collision | Rear Collision | Total Loss | Fire | NC | Northbend | 7476 4th St | 11 | 3 | YES | 1 | 1 | ? | 77110 | 0 | 14020 | 63090 | Honda | Civic | 2014 | N |
355 | 47 | 939631 | 1990-03-18 00:00:00 | OH | 500/1000 | 2000 | 1273.7 | 4000000 | 433473 | MALE | College | other-service | kayaking | husband | 0 | 0 | 2015-01-10 00:00:00 | Multi-vehicle Collision | Front Collision | Major Damage | Fire | WV | Arlington | 8907 Tree Ave | 19 | 3 | NO | 2 | 1 | NO | 62800 | 6280 | 6280 | 50240 | Audi | A3 | 2003 | Y |
175 | 34 | 866931 | 2008-01-07 00:00:00 | IN | 500/1000 | 1000 | 1123.87 | 8000000 | 446326 | FEMALE | PhD | protective-serv | dancing | other-relative | 0 | 0 | 2015-02-26 00:00:00 | Vehicle Theft | ? | Trivial Damage | Police | NY | Arlington | 6619 Flute Ave | 5 | 1 | ? | 2 | 0 | YES | 7290 | 810 | 810 | 5670 | Volkswagen | Passat | 1995 | N |
192 | 35 | 582011 | 1997-03-10 00:00:00 | IL | 100/300 | 1000 | 1245.89 | 0 | 435481 | FEMALE | Masters | exec-managerial | movies | wife | 0 | -40300 | 2015-01-01 00:00:00 | Single Vehicle Collision | Rear Collision | Total Loss | Other | WV | Springfield | 6011 Britain St | 19 | 1 | NO | 0 | 0 | ? | 76600 | 15320 | 7660 | 53620 | Mercedes | C300 | 2000 | N |
430 | 59 | 691189 | 2004-01-10 00:00:00 | OH | 250/500 | 2000 | 1326.62 | 7000000 | 477310 | MALE | MD | other-service | bungie-jumping | own-child | 0 | 0 | 2015-01-03 00:00:00 | Multi-vehicle Collision | Front Collision | Minor Damage | Fire | NY | Riverwood | 5104 Francis Drive | 19 | 3 | ? | 0 | 3 | YES | 81800 | 16360 | 8180 | 57260 | Nissan | Pathfinder | 1998 | N |
91 | 27 | 537546 | 1994-08-20 00:00:00 | IL | 100/300 | 2000 | 1073.83 | 0 | 609930 | FEMALE | JD | farming-fishing | polo | husband | 0 | 0 | 2015-01-17 00:00:00 | Vehicle Theft | ? | Trivial Damage | None | NY | Arlington | 2280 4th Ave | 4 | 1 | ? | 1 | 2 | ? | 7260 | 1320 | 660 | 5280 | BMW | M5 | 2008 | N |
217 | 39 | 394975 | 2002-06-02 00:00:00 | IN | 100/300 | 1000 | 1530.52 | 0 | 603993 | MALE | College | armed-forces | basketball | not-in-family | 0 | 0 | 2015-02-22 00:00:00 | Vehicle Theft | ? | Minor Damage | None | WV | Northbend | 2644 Elm Drive | 8 | 1 | ? | 2 | 1 | YES | 4300 | 430 | 430 | 3440 | Toyota | Corolla | 2000 | N |
223 | 40 | 729634 | 1994-04-28 00:00:00 | IN | 100/300 | 500 | 1201.41 | 0 | 437818 | FEMALE | JD | priv-house-serv | movies | husband | 88400 | -46500 | 2015-01-27 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Police | NC | Columbus | 7466 MLK Ridge | 7 | 3 | YES | 1 | 0 | ? | 70510 | 12820 | 12820 | 44870 | Suburu | Forrestor | 1999 | N |
195 | 39 | 282195 | 2014-08-17 00:00:00 | OH | 250/500 | 1000 | 1393.57 | 0 | 478423 | MALE | PhD | machine-op-inspct | movies | not-in-family | 47600 | -39600 | 2015-02-27 00:00:00 | Parked Car | ? | Minor Damage | Police | VA | Northbend | 5821 2nd St | 5 | 1 | NO | 0 | 1 | YES | 2640 | 480 | 480 | 1680 | Ford | F150 | 2009 | N |
22 | 26 | 420810 | 2007-08-11 00:00:00 | OH | 100/300 | 1000 | 1276.57 | 0 | 467784 | MALE | PhD | craft-repair | skydiving | not-in-family | 71500 | 0 | 2015-01-06 00:00:00 | Single Vehicle Collision | Rear Collision | Minor Damage | Fire | NY | Arlington | 6723 Best Drive | 3 | 1 | YES | 1 | 2 | NO | 78900 | 15780 | 7890 | 55230 | Chevrolet | Silverado | 1995 | N |
439 | 56 | 524836 | 2008-11-20 00:00:00 | IN | 250/500 | 500 | 1082.49 | 0 | 606714 | FEMALE | PhD | prof-specialty | chess | unmarried | 36100 | -55000 | 2015-02-28 00:00:00 | Multi-vehicle Collision | Front Collision | Major Damage | Fire | SC | Columbus | 4866 4th Hwy | 12 | 3 | ? | 2 | 3 | ? | 56430 | 0 | 6270 | 50160 | Honda | CRV | 2014 | N |
94 | 32 | 307195 | 1995-10-18 00:00:00 | IN | 500/1000 | 1000 | 1414.74 | 0 | 464691 | FEMALE | Masters | adm-clerical | hiking | own-child | 0 | 0 | 2015-02-22 00:00:00 | Parked Car | ? | Minor Damage | None | VA | Riverwood | 5418 Britain Ave | 19 | 1 | NO | 1 | 3 | NO | 2400 | 300 | 300 | 1800 | Chevrolet | Silverado | 2014 | N |
11 | 39 | 623648 | 1993-05-19 00:00:00 | IL | 250/500 | 2000 | 1470.06 | 0 | 431683 | MALE | PhD | other-service | yachting | husband | 56600 | -45800 | 2015-01-07 00:00:00 | Single Vehicle Collision | Front Collision | Total Loss | Ambulance | WV | Riverwood | 4296 Pine Hwy | 22 | 1 | YES | 0 | 1 | NO | 65790 | 7310 | 7310 | 51170 | Saab | 93 | 2007 | N |
151 | 36 | 485372 | 2005-02-26 00:00:00 | OH | 250/500 | 2000 | 870.63 | 0 | 431725 | FEMALE | MD | adm-clerical | kayaking | own-child | 94800 | -58500 | 2015-01-06 00:00:00 | Multi-vehicle Collision | Side Collision | Minor Damage | Police | VA | Hillsdale | 2299 1st St | 12 | 3 | NO | 1 | 1 | NO | 62920 | 11440 | 5720 | 45760 | Ford | Escape | 2000 | N |
154 | 34 | 598554 | 1990-02-14 00:00:00 | IN | 100/300 | 500 | 795.23 | 0 | 609216 | MALE | PhD | machine-op-inspct | base-jumping | other-relative | 36900 | 0 | 2015-01-10 00:00:00 | Multi-vehicle Collision | Rear Collision | Major Damage | Police | NY | Springfield | 6618 Cherokee Drive | 15 | 3 | YES | 2 | 1 | ? | 69480 | 15440 | 0 | 54040 | Nissan | Maxima | 2014 | Y |
245 | 44 | 303987 | 1993-09-30 00:00:00 | IL | 500/1000 | 1000 | 1168.2 | 0 | 452787 | MALE | JD | handlers-cleaners | basketball | husband | 69100 | 0 | 2015-02-11 00:00:00 | Multi-vehicle Collision | Side Collision | Total Loss | Other | OH | Springfield | 7459 Flute St | 23 | 3 | NO | 0 | 3 | NO | 44280 | 7380 | 3690 | 33210 | Honda | Accord | 1997 | N |
119 | 32 | 343161 | 2014-06-10 00:00:00 | IL | 500/1000 | 1000 | 993.51 | 0 | 468767 | MALE | High School | armed-forces | bungie-jumping | unmarried | 0 | -49500 | 2015-01-12 00:00:00 | Single Vehicle Collision | Side Collision | Minor Damage | Other | WV | Hillsdale | 3567 4th Drive | 12 | 1 | NO | 0 | 3 | YES | 56300 | 5630 | 11260 | 39410 | BMW | M5 | 2011 | N |
215 | 42 | 519312 | 2008-10-28 00:00:00 | OH | 500/1000 | 500 | 1848.81 | 0 | 435489 | MALE | JD | transport-moving | video-games | own-child | 0 | -49000 | 2015-02-06 00:00:00 | Multi-vehicle Collision | Front Collision | Major Damage | Fire | WV | Northbend | 2457 Washington Ave | 20 | 3 | YES | 2 | 2 | YES | 68520 | 11420 | 5710 | 51390 | Suburu | Legacy | 2003 | Y |
295 | 42 | 132902 | 2007-04-24 00:00:00 | OH | 250/500 | 2000 | 1641.73 | 5000000 | 450149 | MALE | PhD | sales | chess | not-in-family | 62400 | 0 | 2015-01-20 00:00:00 | Multi-vehicle Collision | Rear Collision | Total Loss | Fire | VA | Riverwood | 1269 Flute Drive | 16 | 3 | NO | 0 | 0 | NO | 59130 | 6570 | 6570 | 45990 | Ford | Escape | 2006 | Y |
254 | 39 | 332867 | 1993-12-13 00:00:00 | IN | 100/300 | 500 | 1362.87 | 0 | 458364 | FEMALE | MD | exec-managerial | chess | other-relative | 35700 | 0 | 2015-02-22 00:00:00 | Multi-vehicle Collision | Front Collision | Minor Damage | Ambulance | NY | Arlington | 1218 Sky Hwy | 6 | 3 | YES | 2 | 2 | NO | 82320 | 13720 | 6860 | 61740 | Dodge | Neon | 1995 | Y |
107 | 31 | 356590 | 2011-08-17 00:00:00 | IN | 250/500 | 500 | 1239.22 | 7000000 | 476458 | FEMALE | High School | tech-support | paintball | not-in-family | 43400 | -91200 | 2015-01-30 00:00:00 | Single Vehicle Collision | Side Collision | Minor Damage | Fire | SC | Springfield | 9169 Pine Ridge | 12 | 1 | YES | 0 | 1 | NO | 89700 | 13800 | 13800 | 62100 | Audi | A5 | 2009 | Y |
478 | 64 | 346002 | 1990-08-20 00:00:00 | OH | 250/500 | 500 | 835.02 | 0 | 602433 | FEMALE | Associate | adm-clerical | reading | unmarried | 59600 | 0 | 2015-02-02 00:00:00 | Multi-vehicle Collision | Side Collision | Minor Damage | Fire | WV | Hillsdale | 8538 Texas Lane | 17 | 3 | NO | 1 | 1 | NO | 33930 | 0 | 3770 | 30160 | BMW | X6 | 1998 | N |
128 | 30 | 500533 | 1994-02-11 00:00:00 | OH | 100/300 | 1000 | 1061.33 | 0 | 478575 | MALE | MD | machine-op-inspct | movies | own-child | 43300 | -66200 | 2015-01-10 00:00:00 | Single Vehicle Collision | Front Collision | Major Damage | Ambulance | WV | Northbrook | 5783 Oak Ave | 8 | 1 | NO | 0 | 3 | NO | 68530 | 12460 | 6230 | 49840 | Audi | A5 | 1997 | N |
338 | 49 | 348209 | 1994-02-22 00:00:00 | IN | 500/1000 | 1000 | 1279.08 | 0 | 449718 | MALE | MD | other-service | kayaking | own-child | 0 | -51500 | 2015-02-27 00:00:00 | Parked Car | ? | Minor Damage | None | NC | Riverwood | 7721 Washington Ridge | 13 | 1 | NO | 0 | 1 | ? | 4300 | 860 | 860 | 2580 | Ford | F150 | 2004 | N |
271 | 42 | 486676 | 2011-08-15 00:00:00 | OH | 100/300 | 500 | 1105.49 | 0 | 463181 | FEMALE | Associate | prof-specialty | sleeping | own-child | 56200 | -50000 | 2015-02-20 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Other | SC | Hillsdale | 8006 Maple Hwy | 12 | 2 | ? | 2 | 3 | ? | 68310 | 12420 | 6210 | 49680 | Audi | A3 | 2003 | Y |
222 | 41 | 260845 | 1998-11-11 00:00:00 | OH | 100/300 | 2000 | 1055.53 | 0 | 441992 | FEMALE | MD | armed-forces | cross-fit | not-in-family | 37800 | -50300 | 2015-02-08 00:00:00 | Single Vehicle Collision | Front Collision | Total Loss | Other | WV | Northbrook | 6751 Pine Ridge | 7 | 1 | NO | 0 | 2 | NO | 61290 | 6810 | 6810 | 47670 | Honda | Civic | 1995 | Y |
199 | 41 | 657045 | 1995-12-04 00:00:00 | OH | 250/500 | 1000 | 895.83 | 0 | 452597 | FEMALE | Associate | sales | paintball | husband | 0 | 0 | 2015-02-11 00:00:00 | Single Vehicle Collision | Rear Collision | Minor Damage | Ambulance | NC | Arlington | 2324 Texas Ridge | 10 | 1 | NO | 1 | 2 | NO | 30100 | 3010 | 0 | 27090 | Chevrolet | Malibu | 1999 | N |
215 | 37 | 761189 | 2002-12-28 00:00:00 | IN | 100/300 | 500 | 1632.93 | 0 | 614417 | FEMALE | College | transport-moving | golf | not-in-family | 0 | -42900 | 2015-02-23 00:00:00 | Multi-vehicle Collision | Rear Collision | Minor Damage | Fire | SC | Riverwood | 7923 Elm Ave | 7 | 3 | NO | 2 | 0 | YES | 57120 | 9520 | 4760 | 42840 | Mercedes | C300 | 2002 | N |
192 | 40 | 175177 | 2004-04-15 00:00:00 | IL | 100/300 | 1000 | 1405.99 | 0 | 472895 | FEMALE | Associate | sales | yachting | wife | 0 | 0 | 2015-03-01 00:00:00 | Multi-vehicle Collision | Side Collision | Minor Damage | Ambulance | VA | Springfield | 4755 Best Lane | 18 | 3 | YES | 1 | 0 | YES | 42930 | 9540 | 4770 | 28620 | BMW | X6 | 2005 | N |
120 | 35 | 116700 | 2001-02-02 00:00:00 | OH | 100/300 | 1000 | 1425.54 | 0 | 475847 | FEMALE | High School | transport-moving | bungie-jumping | other-relative | 78300 | 0 | 2015-01-15 00:00:00 | Multi-vehicle Collision | Front Collision | Total Loss | Ambulance | SC | Riverwood | 5053 Tree Drive | 22 | 3 | NO | 2 | 0 | NO | 51210 | 11380 | 5690 | 34140 | Ford | Fusion | 2010 | N |
270 | 45 | 166264 | 2010-01-12 00:00:00 | OH | 500/1000 | 1000 | 1038.09 | 0 | 476978 | FEMALE | College | handlers-cleaners | golf | husband | 0 | -19700 | 2015-01-14 00:00:00 | Multi-vehicle Collision | Front Collision | Minor Damage | Fire | NY | Springfield | 2078 3rd Ave | 18 | 3 | NO | 1 | 1 | YES | 89400 | 14900 | 7450 | 67050 | Suburu | Legacy | 1998 | N |
319 | 47 | 527945 | 1992-04-14 00:00:00 | IN | 250/500 | 500 | 1307.11 | 0 | 600648 | MALE | College | transport-moving | dancing | not-in-family | 0 | 0 | 2015-02-17 00:00:00 | Multi-vehicle Collision | Front Collision | Total Loss | Police | WV | Northbrook | 2804 Best St | 22 | 3 | NO | 0 | 2 | ? | 59730 | 10860 | 10860 | 38010 | Audi | A3 | 2005 | N |
194 | 39 | 627540 | 2010-05-21 00:00:00 | OH | 500/1000 | 1000 | 1489.24 | 6000000 | 608335 | FEMALE | JD | other-service | kayaking | wife | 0 | -45000 | 2015-01-24 00:00:00 | Vehicle Theft | ? | Minor Damage | None | SC | Springfield | 7877 Sky Lane | 15 | 1 | YES | 2 | 2 | YES | 8060 | 1240 | 1240 | 5580 | Saab | 95 | 2004 | N |
227 | 38 | 279422 | 2013-10-27 00:00:00 | OH | 500/1000 | 500 | 976.67 | 0 | 471600 | FEMALE | PhD | handlers-cleaners | polo | unmarried | 0 | 0 | 2015-01-21 00:00:00 | Single Vehicle Collision | Rear Collision | Major Damage | Fire | SC | Northbrook | 6530 Weaver Ave | 16 | 1 | ? | 1 | 2 | ? | 72200 | 14440 | 7220 | 50540 | BMW | M5 | 2013 | Y |
137 | 31 | 484200 | 1994-10-12 00:00:00 | OH | 250/500 | 2000 | 1340.43 | 0 | 441175 | MALE | High School | exec-managerial | paintball | husband | 52700 | -40600 | 2015-02-19 00:00:00 | Multi-vehicle Collision | Side Collision | Minor Damage | Ambulance | NC | Arlington | 3087 Oak Hwy | 6 | 3 | NO | 1 | 2 | NO | 50800 | 10160 | 10160 | 30480 | Accura | MDX | 2005 | N |
244 | 40 | 645258 | 1997-07-04 00:00:00 | OH | 500/1000 | 2000 | 1267.81 | 5000000 | 603123 | FEMALE | Masters | exec-managerial | video-games | wife | 0 | 0 | 2015-01-03 00:00:00 | Vehicle Theft | ? | Trivial Damage | None | NC | Northbrook | 7098 Lincoln Hwy | 10 | 1 | ? | 2 | 1 | ? | 6600 | 660 | 1320 | 4620 | Accura | TL | 2005 | N |
78 | 29 | 694662 | 2011-02-15 00:00:00 | IL | 250/500 | 1000 | 1234.2 | 6000000 | 457767 | MALE | Masters | other-service | bungie-jumping | other-relative | 0 | 0 | 2015-01-29 00:00:00 | Vehicle Theft | ? | Minor Damage | Police | NY | Northbrook | 5124 Maple St | 3 | 1 | YES | 2 | 2 | NO | 7500 | 750 | 1500 | 5250 | Nissan | Maxima | 2002 | N |
200 | 35 | 960680 | 1994-08-21 00:00:00 | IN | 250/500 | 2000 | 1318.06 | 0 | 618498 | MALE | High School | exec-managerial | video-games | wife | 57300 | -80600 | 2015-01-19 00:00:00 | Vehicle Theft | ? | Trivial Damage | None | VA | Hillsdale | 2333 Maple Lane | 13 | 1 | NO | 0 | 3 | YES | 6490 | 1180 | 1180 | 4130 | Volkswagen | Jetta | 2002 | N |
284 | 48 | 498140 | 1997-05-15 00:00:00 | IN | 500/1000 | 2000 | 769.95 | 0 | 605486 | MALE | Masters | prof-specialty | movies | not-in-family | 0 | -44200 | 2015-01-19 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Ambulance | NY | Hillsdale | 1012 5th Lane | 16 | 2 | ? | 2 | 3 | NO | 60940 | 5540 | 11080 | 44320 | Audi | A3 | 2013 | Y |
275 | 41 | 498875 | 1996-10-26 00:00:00 | OH | 100/300 | 2000 | 1514.72 | 0 | 617970 | MALE | High School | transport-moving | board-games | own-child | 35700 | 0 | 2015-02-02 00:00:00 | Multi-vehicle Collision | Front Collision | Major Damage | Fire | NY | Northbrook | 7477 MLK Drive | 13 | 3 | YES | 0 | 1 | ? | 58300 | 5830 | 11660 | 40810 | Suburu | Legacy | 2007 | N |
153 | 34 | 798177 | 2006-03-04 00:00:00 | IL | 500/1000 | 1000 | 873.64 | 4000000 | 432934 | FEMALE | Associate | priv-house-serv | yachting | husband | 800 | 0 | 2015-01-30 00:00:00 | Multi-vehicle Collision | Front Collision | Minor Damage | Other | SC | Columbus | 9489 3rd St | 9 | 3 | NO | 2 | 1 | ? | 68400 | 11400 | 11400 | 45600 | Ford | F150 | 2007 | N |
134 | 32 | 614763 | 1991-01-02 00:00:00 | IL | 500/1000 | 500 | 1612.43 | 0 | 456762 | FEMALE | MD | other-service | yachting | own-child | 36400 | 0 | 2015-01-08 00:00:00 | Single Vehicle Collision | Side Collision | Total Loss | Fire | VA | Springfield | 2087 Apache Ave | 2 | 1 | ? | 2 | 1 | YES | 64240 | 11680 | 11680 | 40880 | BMW | 3 Series | 2015 | N |
31 | 36 | 679370 | 1999-08-15 00:00:00 | IL | 500/1000 | 2000 | 1318.24 | 9000000 | 601748 | FEMALE | College | prof-specialty | kayaking | not-in-family | 0 | -78600 | 2015-01-30 00:00:00 | Parked Car | ? | Trivial Damage | None | WV | Arlington | 5540 Sky St | 9 | 1 | NO | 0 | 1 | YES | 4700 | 940 | 470 | 3290 | Dodge | Neon | 2002 | N |
41 | 25 | 958857 | 1992-01-15 00:00:00 | IN | 100/300 | 1000 | 1226.83 | 0 | 607763 | FEMALE | College | exec-managerial | exercise | not-in-family | 0 | -56100 | 2015-01-07 00:00:00 | Multi-vehicle Collision | Side Collision | Major Damage | Other | SC | Columbus | 7238 2nd St | 12 | 3 | YES | 2 | 0 | ? | 45120 | 0 | 5640 | 39480 | Accura | MDX | 2011 | Y |
127 | 29 | 686816 | 1999-12-07 00:00:00 | OH | 250/500 | 2000 | 1326.44 | 5000000 | 436973 | FEMALE | High School | sales | board-games | own-child | 0 | 0 | 2015-02-24 00:00:00 | Multi-vehicle Collision | Front Collision | Total Loss | Fire | SC | Arlington | 8442 Britain Hwy | 12 | 2 | YES | 1 | 1 | ? | 66950 | 10300 | 10300 | 46350 | Saab | 93 | 1995 | N |
61 | 23 | 127754 | 1993-06-06 00:00:00 | IL | 250/500 | 2000 | 1136.83 | 4000000 | 471300 | FEMALE | Associate | tech-support | exercise | own-child | 0 | -62400 | 2015-02-02 00:00:00 | Single Vehicle Collision | Side Collision | Major Damage | Police | NY | Columbus | 1331 Britain Hwy | 14 | 1 | NO | 0 | 3 | ? | 98340 | 8940 | 17880 | 71520 | Honda | Accord | 2004 | Y |
207 | 42 | 918629 | 2000-10-03 00:00:00 | IL | 250/500 | 2000 | 1322.78 | 0 | 453277 | MALE | PhD | farming-fishing | yachting | own-child | 55200 | 0 | 2015-02-28 00:00:00 | Parked Car | ? | Trivial Damage | None | WV | Springfield | 5260 Francis Drive | 9 | 1 | NO | 0 | 1 | NO | 5900 | 590 | 590 | 4720 | BMW | X5 | 2007 | N |
219 | 43 | 731450 | 2010-12-29 00:00:00 | IN | 100/300 | 1000 | 1483.25 | 0 | 465100 | FEMALE | MD | exec-managerial | exercise | not-in-family | 90700 | -20800 | 2015-02-09 00:00:00 | Multi-vehicle Collision | Front Collision | Major Damage | Ambulance | NC | Riverwood | 1135 Solo Lane | 3 | 3 | NO | 1 | 1 | ? | 70680 | 5890 | 11780 | 53010 | Ford | Fusion | 2009 | N |
271 | 42 | 307447 | 1990-03-17 00:00:00 | IL | 100/300 | 500 | 1515.3 | 0 | 603248 | FEMALE | High School | machine-op-inspct | hiking | not-in-family | 0 | 0 | 2015-01-19 00:00:00 | Multi-vehicle Collision | Rear Collision | Total Loss | Ambulance | SC | Northbend | 9737 Solo Hwy | 21 | 3 | NO | 1 | 0 | NO | 93720 | 17040 | 8520 | 68160 | Mercedes | ML350 | 2005 | N |
80 | 25 | 992145 | 2012-03-01 00:00:00 | IL | 100/300 | 2000 | 1075.18 | 5000000 | 601112 | FEMALE | PhD | armed-forces | exercise | husband | 67700 | -58400 | 2015-02-21 00:00:00 | Vehicle Theft | ? | Minor Damage | None | OH | Northbrook | 3289 Britain Drive | 5 | 1 | NO | 2 | 0 | YES | 6930 | 1260 | 630 | 5040 | Toyota | Highlander | 2001 | N |
325 | 47 | 900628 | 2006-02-05 00:00:00 | IN | 500/1000 | 1000 | 1690.27 | 0 | 438830 | FEMALE | Associate | protective-serv | hiking | not-in-family | 61500 | 0 | 2015-01-14 00:00:00 | Single Vehicle Collision | Side Collision | Major Damage | Fire | VA | Springfield | 6550 Andromedia St | 11 | 1 | YES | 0 | 3 | NO | 72930 | 6630 | 6630 | 59670 | Dodge | RAM | 2006 | Y |
29 | 25 | 235220 | 2014-11-01 00:00:00 | IL | 250/500 | 2000 | 1352.83 | 0 | 464959 | MALE | Masters | farming-fishing | skydiving | own-child | 0 | -71700 | 2015-01-22 00:00:00 | Multi-vehicle Collision | Rear Collision | Minor Damage | Other | SC | Hillsdale | 1679 2nd Hwy | 4 | 4 | YES | 1 | 2 | YES | 64890 | 7210 | 7210 | 50470 | Nissan | Pathfinder | 2013 | Y |
295 | 48 | 740019 | 2009-06-17 00:00:00 | OH | 250/500 | 1000 | 1148.73 | 0 | 439787 | FEMALE | College | machine-op-inspct | kayaking | wife | 0 | 0 | 2015-02-22 00:00:00 | Parked Car | ? | Trivial Damage | None | WV | Columbus | 3998 Flute St | 6 | 1 | ? | 1 | 2 | YES | 5400 | 900 | 900 | 3600 | Saab | 95 | 1999 | N |
239 | 42 | 246882 | 1999-09-20 00:00:00 | IL | 100/300 | 1000 | 969.5 | 0 | 464839 | MALE | College | exec-managerial | reading | not-in-family | 0 | 0 | 2015-01-26 00:00:00 | Vehicle Theft | ? | Trivial Damage | None | NC | Northbrook | 2430 MLK Ave | 10 | 1 | NO | 0 | 0 | ? | 5600 | 700 | 700 | 4200 | Audi | A3 | 2007 | N |
months_as_customer | age | policy_number | policy_bind_date | policy_state | policy_csl | policy_deductable | policy_annual_premium | umbrella_limit | insured_zip | insured_sex | insured_education_level | insured_occupation | insured_hobbies | insured_relationship | capital-gains | capital-loss | incident_date | incident_type | collision_type | incident_severity | authorities_contacted | incident_state | incident_city | incident_location | incident_hour_of_the_day | number_of_vehicles_involved | property_damage | bodily_injuries | witnesses | police_report_available | total_claim_amount | injury_claim | property_claim | vehicle_claim | auto_make | auto_model | auto_year | fraud_reported |
---|
Last refresh: Never
%md ## Data Exploration We have several string (categorical) columns in our dataset, along with some ints and doubles.
Data Exploration
We have several string (categorical) columns in our dataset, along with some ints and doubles.
Last refresh: Never
display(df.dtypes)
months_as_customer | int |
age | int |
policy_number | int |
policy_bind_date | string |
policy_state | string |
policy_csl | string |
policy_deductable | int |
policy_annual_premium | double |
umbrella_limit | int |
insured_zip | int |
insured_sex | string |
insured_education_level | string |
insured_occupation | string |
insured_hobbies | string |
insured_relationship | string |
capital-gains | int |
capital-loss | int |
incident_date | string |
incident_type | string |
collision_type | string |
incident_severity | string |
authorities_contacted | string |
incident_state | string |
incident_city | string |
incident_location | string |
incident_hour_of_the_day | int |
number_of_vehicles_involved | int |
property_damage | string |
bodily_injuries | int |
witnesses | int |
police_report_available | string |
total_claim_amount | int |
injury_claim | int |
property_claim | int |
vehicle_claim | int |
auto_make | string |
auto_model | string |
auto_year | int |
fraud_reported | string |
_1 | _2 |
---|
Last refresh: Never
%md Count number of categories for every categorical column (Count Distinct).
Count number of categories for every categorical column (Count Distinct).
Last refresh: Never
# Create a List of Column Names with data type = string stringColList = [i[0] for i in df.dtypes if i[1] == 'string'] print stringColList
from pyspark.sql.functions import * # Create a function that performs a countDistinct(colName) distinctList = [] def countDistinctCats(colName): count = df.agg(countDistinct(colName)).collect() distinctList.append(count)
# Apply function on every column in stringColList map(countDistinctCats, stringColList) print distinctList
%md We have identified that some string columns have many distinct values (900+). We will remove these columns from our dataset in the Data Processing step to improve model accuracy. * policy number (1000 distinct) * policy bind date (951 distinct. Possible to narrow down to year/month to test model accuracy) * insured zip (995 distinct) * insured location (1000 distinct) * incident date (60 distinct. Excluding, but possible to narrow down to year/month to test model accuracy)
We have identified that some string columns have many distinct values (900+). We will remove these columns from our dataset in the Data Processing step to improve model accuracy.
- policy number (1000 distinct)
- policy bind date (951 distinct. Possible to narrow down to year/month to test model accuracy)
- insured zip (995 distinct)
- insured location (1000 distinct)
- incident date (60 distinct. Excluding, but possible to narrow down to year/month to test model accuracy)
Last refresh: Never
%md Like most fraud datasets, our label distribution is skewed.
Like most fraud datasets, our label distribution is skewed.
Last refresh: Never
%md We can quickly create one-click plots using Databricks built-in visualizations to understand our data better. Click 'Plot Options' to try out different chart types.
We can quickly create one-click plots using Databricks built-in visualizations to understand our data better.
Click 'Plot Options' to try out different chart types.
Last refresh: Never
# Breakdown of Average Vehicle claim by insured's education level, grouped by fraud reported display(df)
%md ## Data Processing Next, we will clean up the data a little and prepare it for our machine learning model. We will first remove the columns that we have identified earlier that have too many distinct categories and cannot be converted to numeric.
Data Processing
Next, we will clean up the data a little and prepare it for our machine learning model.
We will first remove the columns that we have identified earlier that have too many distinct categories and cannot be converted to numeric.
Last refresh: Never
Insurance Claims - Fraud Detection
Business case:
Insurance fraud is a huge problem in the industry. It's difficult to identify fraud claims. IHS is in a unique position to help the Auto Insurance industry with this problem.
Problem Statement:
Data is stored in different systems and its difficult to build analytics using multiple data sources. Copying data into a single platform is time consuming.
Business solution:
Use S3 as a data lake to store different sources of data in a single platform. This allows data scientists / analysis to quickly analyze the data and generate reports to predict market trends and/or make financial decisions.
Technical Solution:
Use Databricks as a single platform to pull various sources of data from API endpoints, or batch dumps into S3 for further processing. ETL the CSV datasets into efficient Parquet formats for performant processing.
Last refresh: Never