问小白 wenxiaobai
资讯
历史
科技
环境与自然
成长
游戏
财经
文学与艺术
美食
健康
家居
文化
情感
汽车
三农
军事
旅行
运动
教育
生活
星座命理

汽车价格的回归预测项目

创作时间:
作者:
@小白创作中心

汽车价格的回归预测项目

引用
CSDN
1.
https://blog.csdn.net/qq_39297053/article/details/136857815

汽车价格预测是一个旨在预估二手车市场中汽车售价的问题。这个问题涉及到分析各种影响汽车价格的因素,如品牌、车龄、性能参数等。准确的价格预测对于卖家定价和买家预算规划都非常重要。

项目目标

此项目的主要目标是开发一个预测模型,该模型能够根据汽车的各种特征准确预测其市场价值。这个模型应能处理不同类型的数据,包括数值数据和类别数据,并在预测准确度和计算效率之间取得平衡。

项目应用

  • 二手车交易:帮助买家和卖家了解特定车辆的公平市场价值。
  • 汽车评估:为汽车评估公司提供自动化的价值评估工具。
  • 市场分析:分析市场趋势,预测未来价值。
  • 个人决策支持:帮助个人用户在购买或出售汽车时做出更明智的决策。

数据集描述

这个数据集包含以下特征:
汽车ID,符号,汽车名称,燃油类型,吸气,门号,车身,驱动轮,发动机位置,轴距,车长,车宽,车高,整备质量,发动机类型,气缸数,发动机尺寸,燃油系统,硼比,冲程,压缩比,马力,峰值转速,城市英里数,高速公路英里数。

模型选择和科学计算库依赖

本项目使用的模型:

  • 线性回归
  • 决策树回归
  • 随机森林回归

本项目依赖的科学计算库

  • matplotlib==3.7.1
  • pandas==2.0.2
  • scikit_learn==1.2.2
  • seaborn==0.13.0

项目详细代码

#imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score  
data = pd.read_csv('car_price.csv')
data.head(10)  

1. 探索数据特性

print("Rows: ",data.shape[0])
print("Columns: ",data.shape[1])  
Rows: 205
Columns: 26
data.info()  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 # Column Non-Null Count Dtype 
--- ------ -------------- ----- 
 0 car_ID 205 non-null int64 
 1 symboling 205 non-null int64 
 2 CarName 205 non-null object 
 3 fueltype 205 non-null object 
 4 aspiration 205 non-null object 
 5 doornumber 205 non-null object 
 6 carbody 205 non-null object 
 7 drivewheel 205 non-null object 
 8 enginelocation 205 non-null object 
 9 wheelbase 205 non-null float64
 10 carlength 205 non-null float64
 11 carwidth 205 non-null float64
 12 carheight 205 non-null float64
 13 curbweight 205 non-null int64 
 14 enginetype 205 non-null object 
 15 cylindernumber 205 non-null object 
 16 enginesize 205 non-null int64 
 17 fuelsystem 205 non-null object 
 18 boreratio 205 non-null float64
 19 stroke 205 non-null float64
 20 compressionratio 205 non-null float64
 21 horsepower 205 non-null int64 
 22 peakrpm 205 non-null int64 
 23 citympg 205 non-null int64 
 24 highwaympg 205 non-null int64 
 25 price 205 non-null float64
dtypes: float64(8), int64(8), object(10)
memory usage: 41.8+ KB
data.isna().sum()
# 没有空值  
car_ID 0
symboling 0
CarName 0
fueltype 0
aspiration 0
doornumber 0
carbody 0
drivewheel 0
enginelocation 0
wheelbase 0
carlength 0
carwidth 0
carheight 0
curbweight 0
enginetype 0
cylindernumber 0
enginesize 0
fuelsystem 0
boreratio 0
stroke 0
compressionratio 0
horsepower 0
peakrpm 0
citympg 0
highwaympg 0
price 0
dtype: int64
data.duplicated().sum()
# 没有重复值  

0

data.groupby("CarName").sum(numeric_only=True)  
# 删除 CarName, CarID 因为它不会给回归任务增加太多价值
data = data.drop(['car_ID','CarName'],axis=1)
data.head(1)  
sns.histplot(data=data, x="price")  

plt.figure(figsize=(15,7))
sns.heatmap(data.corr(numeric_only=True), annot=True)
plt.title("Data Correlation",size=15)
plt.show()  

#燃料类型对价格的影响
sns.barplot(x="fueltype", y="price", data=data)  
<Axes: xlabel='fueltype', ylabel='price'>

#车型对价格的影响
sns.boxplot(x ="carbody", y ="price", data = data)  
<Axes: xlabel='carbody', ylabel='price'>
#门数对价格的影响
sns.boxplot(x ="doornumber", y ="price", data = data)  
<Axes: xlabel='doornumber', ylabel='price'>

#驱动器(FWD、RWD、AWD)对价格的影响
sns.boxplot(x ="drivewheel", y ="price", data = data)  
<Axes: xlabel='drivewheel', ylabel='price'>

#绘制热图中最相关属性之间的成对关系
columns=data[['wheelbase','carlength','carwidth','curbweight','price']]
sns.pairplot(columns)
plt.show()
#linear relationship

columns=data[['horsepower','citympg','highwaympg','price']]
sns.pairplot(columns)
plt.show()
#linear relationship  

以下属性集具有线性关系:
1.轴距、车长、车宽、整备质量和价格(基本上是所有物理属性)
2.马力、城市英里数、高速公路英里数和价格(基本上是与车辆功率相关的所有属性)

2. 训练模型

encoder = LabelEncoder()
data['fueltype'] = encoder.fit_transform(data['fueltype'])
fueltype = {index : label for index, label in enumerate(encoder.classes_)}
data['aspiration'] = encoder.fit_transform(data['aspiration'])
aspiration = {index : label for index, label in enumerate(encoder.classes_)}
data['doornumber'] = encoder.fit_transform(data['doornumber'])
doornumber = {index : label for index, label in enumerate(encoder.classes_)}
data['carbody'] = encoder.fit_transform(data['carbody'])
carbody = {index : label for index, label in enumerate(encoder.classes_)}
data['drivewheel'] = encoder.fit_transform(data['drivewheel'])
drivewheel = {index : label for index, label in enumerate(encoder.classes_)}
data['enginelocation'] = encoder.fit_transform(data['enginelocation'])
enginelocation = {index : label for index, label in enumerate(encoder.classes_)}
data['fuelsystem'] = encoder.fit_transform(data['fuelsystem'])
fuelsystem = {index : label for index, label in enumerate(encoder.classes_)}
data['enginetype'] = encoder.fit_transform(data['enginetype'])
enginetype = {index : label for index, label in enumerate(encoder.classes_)}
data['cylindernumber'] = encoder.fit_transform(data['cylindernumber'])
cylindernumber = {index : label for index, label in enumerate(encoder.classes_)}
data['fuelsystem'] = encoder.fit_transform(data['fuelsystem'])
fuelsystem = {index : label for index, label in enumerate(encoder.classes_)}  
x = data.drop('price', axis=1)
y = data['price']  
scaler = MinMaxScaler(copy=True, feature_range=(0, 1))
X = scaler.fit_transform(x)  
#train, test split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=30,random_state=0)  
  1. 随机森林回归
rf = RandomForestRegressor(n_estimators=100,max_depth=5, random_state=33)
rf.fit(x_train, y_train)  
print("Training r2_score: ",rf.score(x_train, y_train))
print("Testing r2_score: ",rf.score(x_test, y_test))  
Training r2_score: 0.9753559007565417
Testing r2_score: 0.87367804775233
  1. 决策树回归
dt = DecisionTreeRegressor( max_depth=5,random_state=33)
dt.fit(x_train, y_train)  
print('Training r2_score: ' , dt.score(x_train, y_train))
print('Testing r2_score: ' , dt.score(x_test, y_test))  
Training r2_score: 0.9735394081185511
Testing r2_score: 0.8226507572837073

2.线性回归

def evaluate(model,x_train , y_train, x_test , y_test, y_predict):
    print(f'train r2_score:{r2_score(y_train, model.predict(x_train))}' )
    print(f'test r2_score : {r2_score(y_test, y_predict)}')  
model = LinearRegression()
model.fit(x_train,y_train)
y_predict=model.predict(x_test)
evaluate(model,x_train , y_train, x_test , y_test, y_predict)  
train r2_score:0.889157847638672
test r2_score : 0.7289860743863041

项目资源下载

详情请见汽车价格的回归预测项目-VenusAI (aideeplearning.cn)

© 2023 北京元石科技有限公司 ◎ 京公网安备 11010802042949号