Download CSV. 2737 Downloads: Census Income. Predict if an individual makes greater or less than $50000 per year. Instances: 48842, Attributes: 15, Tasks: Classification. Download CSV. 2672 Downloads: German Credit Data. Determine customer credit rating (good vs bad). % example.) The data set contains 3 classes of 50 instances each,% where each class refers to a type of iris plant. One class is% linearly separable from the other 2; the latter are NOT linearly% separable from each other.% - Predicted attribute: class of iris plant.% - This is an exceedingly simple domain.%% 5. A zip file containing a new, image-based version of the classic iris data, with 50 images for each of the three species of iris. The images have size 600x600. Please see the ARFF file for further information (irisreloaded.zip, 92,267,000 Bytes). After expanding into a directory using your jar utility (or an archive program that handles tar. The German Credit Data contains data on 20 variables and the classification whether an applicant is considered a Good or a Bad credit risk for 1000 loan applicants. Here is a link to the German Credit data (right-click and 'save as'). A predictive model developed on this data is expected to provide a bank manager guidance for making a decision. Mar 18, 2016  Here this model is (slightly) better than the logistic regression. Actually, if we create many training/validation samples, and compare the AUC, we can observe that – on average – random forests perform better than logistic regressions.

  1. German Credit Data Set Arff Downloads
  2. German Credit Data Set Arff Download 2017
  3. German Credit Data Set Arff Download Online
  4. German Credit Data Set Arff Download Free
[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In our data science course, this morning, we’ve use random forrest to improve prediction on the German Credit Dataset. The dataset is

Almost all variables are treated a numeric, but actually, most of them are factors,

(etc). Let us convert categorical variables as factors,

Let us now create our training/calibration and validation/testing datasets, with proportion 1/3-2/3

The first model we can fit is a logistic regression, on selected covariates

Based on that model, it is possible to draw the ROC curve, and to compute the AUC (on ne validation dataset)

An alternative is to consider a logistic regression on all explanatory variables

German

We might overfit, here, and we should observe that on the ROC curve

There is a slight improvement here, compared with the previous model, where only five explanatory variables were considered.

Consider now some regression tree (on all covariates)

We can visualize the tree using

The ROC curve for that model is

As expected, a single has a lower performance, compared with a logistic regression. And a natural idea is to grow several trees using some boostrap procedure, and then to agregate those predictions.

Here this model is (slightly) better than the logistic regression. Actually, if we create many training/validation samples, and compare the AUC, we can observe that – on average – random forests perform better than logistic regressions,

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Statlog German Credit

The dataset contains data of past credit applicants. The applicants are ratedas good or bad. Models of this data can be used to determine ifnew applicants present a good or bad credit risk.

Keywords
datasets
Usage
Details

The use of a cost matrix is suggested for this dataset. It is worse to class a customer as good when they are bad (cost = 5), than it is to class a customer as bad when they are good (cost = 1).

Aktifasi serial number crack gratis keygen dan patch download lagu karaoke full gratis download software instalasi karaoke billing. Download Game. Robokill titan prime full version free download.

Format

A data frame containing 1,000 observations on 21 variables.

status

factor variable indicating the status of the existing checking account, with levels .. < 0 DM, 0 <= .. < 200 DM, .. >= 200 DM/salary for at least 1 year and no checking account.

duration

duration in months.

credit_history

factor variable indicating credit history, with levels no credits taken/all credits paid back duly, all credits at this bank paid back duly, existing credits paid back duly till now, delay in paying off in the past and critical account/other credits existing.

purpose

factor variable indicating the credit's purpose, with levels car (new), car (used), furniture/equipment, radio/television, domestic appliances, repairs, education, retraining, business and others.

amount

German Credit Data Set Arff Downloads

credit amount.

savings

factor. savings account/bonds, with levels .. < 100 DM, 100 <= .. < 500 DM, 500 <= .. < 1000 DM, .. >= 1000 DM and unknown/no savings account.

employment_duration

ordered factor indicating the duration of the current employment, with levels unemployed, .. < 1 year, 1 <= .. < 4 years, 4 <= .. < 7 years and .. >= 7 years.

installment_rate

installment rate in percentage of disposable income.

personal_status_sex

factor variable indicating personal status and sex, with levels male:divorced/separated, female:divorced/separated/married, male:single, male:married/widowed and female:single.

other_debtors

factor. Other debtors, with levels none, co-applicant and guarantor.

present_residence

present residence since?

property

factor variable indicating the client's highest valued property, with levels real estate, building society savings agreement/life insurance, car or other and unknown/no property.

age

client's age.

other_installment_plans

factor variable indicating other installment plans, with levels bank, stores and none. Digital prism 3 in 1 photo converter driver for mac torrent.

housing

factor variable indicating housing, with levels rent, own and for free.

number_credits

German Credit Data Set Arff Download 2017

number of existing credits at this bank.

job

factor indicating employment status, with levels unemployed/unskilled - non-resident, unskilled - resident, skilled employee/official and management/self-employed/highly qualified employee/officer.

people_liable

Number of people being liable to provide maintenance.

telephone

binary variable indicating if the customer has a registered telephone number.

foreign_worker

binary variable indicating if the customer is a foreign worker.

German Credit Data Set Arff Download Online

credit_risk

binary variable indicating credit risk, with levels good and bad.

German Credit Data Set Arff Download Free

Aliases
  • GermanCredit
Examples
Documentation reproduced from package evtree, version 1.0-8, License: GPL-2 GPL-3

Community examples

API documentation