Ä¿±ê:
- ÕÆÎÕK-½üÁÚË㷨ʵÏÖ¹ý³Ì
- ÖªµÀK-½üÁÚËã·¨µÄ¾àÀ빫ʽ
- ÖªµÀK-½üÁÚËã·¨µÄ³¬²ÎÊýKÖµÒÔ¼°È¡ÖµÎÊÌâ
- ÖªµÀKDÊ÷ʵÏÖËÑË÷µÄ¹ý³Ì
- Ó¦ÓÃKNeighborsClassifierʵÏÖ·ÖÀà
- ÖªµÀK-½üÁÚËã·¨µÄÓÅȱµã
- ÖªµÀ½»²æÑé֤ʵÏÖ¹ý³Ì
- ÖªµÀ³¬²ÎÊýËÑË÷¹ý³Ì
- Ó¦ÓÃGridSearchCVʵÏÖËã·¨²ÎÊýµÄµ÷ÓÅ
1 K-½üÁÚËã·¨¼ò½é
1.1 ʲôÊÇKNN
-
¸ÅÄî
Èç¹ûÒ»¸öÑù±¾ÔÚÌØÕ÷¿Õ¼äÖеÄk¸ö×îÏàËÆ(¼´ÌØÕ÷¿Õ¼äÖÐ×îÁÚ½ü)µÄÑù±¾ÖеĴó¶àÊýÊôÓÚijһ¸öÀà±ð,Ôò¸ÃÑù±¾Ò²ÊôÓÚÕâ¸öÀà±ð¡£
-
¾àÀ빫ʽ
-
ŷʽ¾àÀë
2 K-½üÁÚËã·¨API³õ²½Ê¹ÓÃ
2.1 Scikit-learn
2.2 K-½üÁÚËã·¨API
-
API
sklearn.neighbors.KNeighborsClassifier(n_neighbors=5)
- n_neighbors:int,¿ÉÑ¡(ĬÈÏ= 5),k_neighbors²éѯĬÈÏʹÓõÄÁÚ¾ÓÊý
-
ʵÏÖÁ÷³Ì
- ¼ÆËãÒÑÖªÀà±ðÊý¾Ý¼¯ÖеĵãÓ뵱ǰµãÖ®¼äµÄ¾àÀë
- °´¾àÀëµÝÔö´ÎÐòÅÅÐò
- Ñ¡È¡Ó뵱ǰµã¾àÀë×îСµÄk¸öµã
- ͳ¼ÆÇ°k¸öµãËùÔÚµÄÀà±ð³öÏÖµÄƵÂÊ
- ·µ»ØÇ°k¸öµã³öÏÖƵÂÊ×î¸ßµÄÀà±ð×÷Ϊµ±Ç°µãµÄÔ¤²â·ÖÀà
-
°¸Àý from sklearn.neighbors import KNeighborsClassifier
x = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
estimator = KNeighborsClassifier(n_neighbors=2)
estimator.fit(x, y)
estimator.predict([[1]])
3 ¾àÀë±äÁ¿
3.1 ŷʽ¾àÀë
Å·ÊϾàÀëÊÇ×îÈÝÒ×Ö±¹ÛÀí½âµÄ¾àÀë¶ÈÁ¿·½·¨,ÎÒÃÇСѧ¡¢³õÖк͸ßÖнӴ¥µ½µÄÁ½¸öµãÔÚ¿Õ¼äÖеľàÀëÒ»°ã¶¼ÊÇָŷÊϾàÀë¡£
3.2 Âü¹þ¶Ù¾àÀë
ÔÚÂü¹þ¶Ù½ÖÇøÒª´ÓÒ»¸öÊ®×Ö·¿Ú¿ª³µµ½ÁíÒ»¸öÊ®×Ö·¿Ú,¼ÝÊ»¾àÀëÏÔÈ»²»ÊÇÁ½µã¼äµÄÖ±Ïß¾àÀë¡£Õâ¸öʵ¼Ê¼ÝÊ»¾àÀë¾ÍÊÇ¡°Âü¹þ¶Ù¾àÀ롱¡£Âü¹þ¶Ù¾àÀëÒ²³ÆΪ¡°³ÇÊнÖÇø¾àÀ롱(City Block distance)¡£
3.3 ÇбÈÑ©·ò¾àÀë
¹ú¼ÊÏóÆåÖÐ,¹úÍõ¿ÉÒÔÖ±ÐС¢ºáÐС¢Ð±ÐÐ,ËùÒÔ¹úÍõ×ßÒ»²½¿ÉÒÔÒƶ¯µ½ÏàÁÚ8¸ö·½¸ñÖеÄÈÎÒâÒ»¸ö¡£¹úÍõ´Ó¸ñ×Ó(x1,y1)×ßµ½¸ñ×Ó(x2,y2)×îÉÙÐèÒª¶àÉÙ²½?Õâ¸ö¾àÀë¾Í½ÐÇбÈÑ©·ò¾àÀë¡£
3.4 ãÉ¿É·ò˹»ù¾àÀë
ãÉÊϾàÀë²»ÊÇÒ»ÖÖ¾àÀë,¶øÊÇÒ»×é¾àÀëµÄ¶¨Òå,ÊǶԶà¸ö¾àÀë¶ÈÁ¿¹«Ê½µÄ¸ÅÀ¨ÐԵıíÊö¡£
Á½¸önά±äÁ¿a(x11,x12,¡,x1n)Óëb(x21,x22,¡,x2n)¼äµÄãÉ¿É·ò˹»ù¾àÀ붨ÒåΪ:
ÆäÖÐpÊÇÒ»¸ö±ä²ÎÊý:
µ±p=1ʱ,¾ÍÊÇÂü¹þ¶Ù¾àÀë;
µ±p=2ʱ,¾ÍÊÇÅ·ÊϾàÀë;
µ±p¡ú¡Þʱ,¾ÍÊÇÇбÈÑ©·ò¾àÀë¡£
¸ù¾ÝpµÄ²»Í¬,ãÉÊϾàÀë¿ÉÒÔ±íʾijһÀà/ÖֵľàÀë¡£
-
ȱµã
¾ÙÀý:¶þάÑù±¾(Éí¸ß[µ¥Î»:cm],ÌåÖØ[µ¥Î»:kg]),ÏÖÓÐÈý¸öÑù±¾:a(180,50),b(190,50),c(180,60)¡£aÓëbµÄãÉÊϾàÀë(ÎÞÂÛÊÇÂü¹þ¶Ù¾àÀ롢ŷÊϾàÀë»òÇбÈÑ©·ò¾àÀë)µÈÓÚaÓëcµÄãÉÊϾàÀë¡£µ«Êµ¼ÊÉÏÉí¸ßµÄ10cm²¢²»ÄܺÍÌåÖصÄ10kg»®µÈºÅ¡£
- ½«¸÷¸ö·ÖÁ¿µÄÁ¿¸Ù(scale),Ò²¾ÍÊÇ¡°µ¥Î»¡±ÏàͬµÄ¿´´ýÁË;
- δ¿¼ÂǸ÷¸ö·ÖÁ¿µÄ·Ö²¼(ÆÚÍû,·½²îµÈ)¿ÉÄÜÊDz»Í¬µÄ¡£
3.5 ±ê×¼»¯Å·Ê½¾àÀë
±ê×¼»¯Å·ÊϾàÀëÊÇÕë¶ÔÅ·ÊϾàÀëµÄȱµã¶ø×÷µÄÒ»ÖָĽø¡£
˼·:¼ÈÈ»Êý¾Ý¸÷ά·ÖÁ¿µÄ·Ö²¼²»Ò»Ñù,ÄÇÏȽ«¸÷¸ö·ÖÁ¿¶¼¡°±ê×¼»¯¡±µ½¾ùÖµ¡¢·½²îÏàµÈ¡£¼ÙÉèÑù±¾¼¯XµÄ¾ùÖµ(mean)Ϊm,±ê×¼²î(standard deviation)Ϊs,XµÄ¡°±ê×¼»¯±äÁ¿¡±±íʾΪ:
3.6 ÓàÏÒ¾àÀë
¼¸ºÎÖÐ,¼Ð½ÇÓàÏÒ¿ÉÓÃÀ´ºâÁ¿Á½¸öÏòÁ¿·½ÏòµÄ²îÒì;»úÆ÷ѧϰÖÐ,½èÓÃÕâÒ»¸ÅÄîÀ´ºâÁ¿Ñù±¾ÏòÁ¿Ö®¼äµÄ²îÒì¡£
¼Ð½ÇÓàÏÒÈ¡Öµ·¶Î§Îª[-1,1]¡£ÓàÏÒÔ½´ó±íʾÁ½¸öÏòÁ¿µÄ¼Ð½ÇԽС,ÓàÏÒԽС±íʾÁ½ÏòÁ¿µÄ¼Ð½ÇÔ½´ó¡£
µ±Á½¸öÏòÁ¿µÄ·½ÏòÖغÏʱÓàÏÒÈ¡×î´óÖµ1,µ±Á½¸öÏòÁ¿µÄ·½ÏòÍêÈ«Ïà·´ÓàÏÒÈ¡×îСֵ-1¡£
Á½¸önάÑù±¾µãa(x11,x12,¡,x1n)ºÍb(x21,x22,¡,x2n)µÄ¼Ð½ÇÓàÏÒΪ:
¼´Îª:
3.7 ½Ü¿¨µÂ¾àÀë
-
½Ü¿¨µÂϵÊý
Á½¸ö¼¯ºÏAºÍBµÄ½»¼¯ÔªËØÔÚA,BµÄ²¢¼¯ÖÐËùÕ¼µÄ±ÈÀý,³ÆΪÁ½¸ö¼¯ºÏµÄ½Ü¿¨µÂÏàËÆϵÊý,Ó÷ûºÅJ(A,B)±íʾ
-
½Ü¿¨µÂ¾àÀë
Óë½Ü¿¨µÂÏàËÆϵÊýÏà·´,ÓÃÁ½¸ö¼¯ºÏÖв»Í¬ÔªËØÕ¼ËùÓÐÔªËصıÈÀýÀ´ºâÁ¿Á½¸ö¼¯ºÏµÄÇø·Ö¶È
3.8 ººÃ÷¾àÀë(ÂÔ)
3.9 ÂíÊϾàÀë(ÂÔ)
4 KÖµµÄÑ¡Ôñ
-
KÖµµÄÓ°Ïì
KÖµ¹ýС:ÈÝÒ×Êܵ½Òì³£µãµÄÓ°Ïì,Òâζ×ÅÕûÌåÄ£ÐͱäµÃ¸´ÔÓ,ÈÝÒ×·¢Éú¹ýÄâºÏ
KÖµ¹ý´ó:Êܵ½Ñù±¾¾ùºâµÄÓ°Ïì,ÓëÊäÈëʵÀý½ÏÔ¶(²»ÏàËƵÄ)ѵÁ·ÊµÀýÒ²»á¶ÔÔ¤²âÆð×÷ÓÃ,ʹԤ²â·¢Éú´íÎó,ÇÒKÖµµÄÔö´ó¾ÍÒâζ×ÅÕûÌåµÄÄ£ÐͱäµÃ¼òµ¥¡£
ÔÚʵ¼ÊÓ¦ÓÃÖÐ,KÖµÒ»°ãÈ¡Ò»¸ö±È½ÏСµÄÊýÖµ,ÀýÈç²ÉÓý»²æÑéÖ¤·¨(ѵÁ·¼¯ºÍÑéÖ¤¼¯)À´Ñ¡Ôñ×îÓŵÄKÖµ¡£¶ÔÕâ¸ö¼òµ¥µÄ·ÖÀàÆ÷½øÐзº»¯,Óú˷½·¨°ÑÕâ¸öÏßÐÔÄ£ÐÍÀ©Õ¹µ½·ÇÏßÐÔµÄÇé¿ö,¾ßÌå·½·¨ÊǰѵÍάÊý¾Ý¼¯Ó³Éäµ½¸ßάÌØÕ÷¿Õ¼ä¡£
-
½üËÆÎó²î
¶ÔÏÖÓÐѵÁ·¼¯µÄѵÁ·Îó²î,¹ØעѵÁ·¼¯,Èç¹û½üËÆÎó²î¹ýС¿ÉÄÜ»á³öÏÖ¹ýÄâºÏµÄÏÖÏó,¶ÔÏÖÓеÄѵÁ·¼¯ÄÜÓкܺõÄÔ¤²â,µ«ÊǶÔδ֪µÄ²âÊÔÑù±¾½«»á³öÏֽϴóÆ«²îµÄÔ¤²â¡£Ä£Ðͱ¾Éí²»ÊÇ×î½Ó½ü×î¼ÑÄ£ÐÍ¡£
-
¹À¼ÆÎó²î
¿ÉÒÔÀí½âΪ¶Ô²âÊÔ¼¯µÄ²âÊÔÎó²î,¹Ø×¢²âÊÔ¼¯,¹À¼ÆÎó²îС˵Ã÷¶Ôδ֪Êý¾ÝµÄÔ¤²âÄÜÁ¦ºÃ,Ä£Ðͱ¾Éí×î½Ó½ü×î¼ÑÄ£ÐÍ¡£
5 KDÊ÷
**k½üÁÚ·¨×î¼òµ¥µÄʵÏÖÊÇÏßÐÔɨÃè(Çî¾ÙËÑË÷),¼´Òª¼ÆËãÊäÈëʵÀýÓëÿһ¸öѵÁ·ÊµÀýµÄ¾àÀë¡£¼ÆËã²¢´æ´¢ºÃÒÔºó,ÔÙ²éÕÒK½üÁÚ¡£**µ±ÑµÁ·¼¯ºÜ´óʱ,¼ÆËã·Ç³£ºÄʱ¡£ÎªÁËÌá¸ßkNNËÑË÷µÄЧÂÊ,¿ÉÒÔ¿¼ÂÇʹÓÃÌØÊâµÄ½á¹¹´æ´¢ÑµÁ·Êý¾Ý,ÒÔ¼õС¼ÆËã¾àÀëµÄ´ÎÊý¡£
5.1 KDÊ÷¼ò½é
µ±Êý¾Ý¼¯ºÜ´óʱ,Õâ¸ö¼ÆËã³É±¾·Ç³£¸ß,Õë¶ÔN¸öÑù±¾,D¸öÌØÕ÷µÄÊý¾Ý¼¯,ÆäËã·¨¸´ÔÓ¶ÈΪO(DN^2)¡£
5.2 ¹¹Ôì·½·¨
-
¹¹Ôì¸ù½áµã,ʹ¸ù½áµã¶ÔÓ¦ÓÚKά¿Õ¼äÖаüº¬ËùÓÐʵÀýµãµÄ³¬¾ØÐÎÇøÓò -
ͨ¹ýµÝ¹éµÄ·½·¨,²»¶ÏµØ¶Ôkά¿Õ¼ä½øÐÐÇзÖ,Éú³É×Ó½áµã -
ÉÏÊö¹ý³ÌÖ±µ½×ÓÇøÓòÄÚûÓÐʵÀýʱÖÕÖ¹(ÖÕֹʱµÄ½áµãΪҶ½áµã)¡£Ôڴ˹ý³ÌÖÐ,½«ÊµÀý±£´æÔÚÏàÓ¦µÄ½áµãÉÏ¡£ -
ͨ³£,Ñ»·µÄÑ¡Ôñ×ø±êÖá¶Ô¿Õ¼äÇзÖ,Ñ¡ÔñѵÁ·ÊµÀýµãÔÚ×ø±êÖáÉϵÄÖÐλÊýΪÇзֵã,ÕâÑùµÃµ½µÄkdÊ÷ÊÇƽºâµÄ ÔÚ¹¹½¨KDÊ÷ʱ,¹Ø¼üÐèÒª½â¾ö2¸öÎÊÌâ:
6 °¸Àý·ÖÎö
6.1 ð°Î²»¨ÖÖÀàÔ¤²â
IrisÊý¾Ý¼¯Êdz£ÓõķÖÀàʵÑéÊý¾Ý¼¯,ÓÉFisher, 1936ÊÕ¼¯ÕûÀí¡£
6.2 scikit-learnÊý¾Ý¼¯½éÉÜ
-
scikit-learnÊý¾Ý¼¯API
-
sklearn.datasets
datasets.load_*() :»ñȡС¹æÄ£Êý¾Ý¼¯,Êý¾Ý°üº¬ÔÚdatasetsÀï
datasets.fetch_*(data_home=None):»ñÈ¡´ó¹æÄ£Êý¾Ý¼¯,ÐèÒª´ÓÍøÂçÉÏÏÂÔØ,º¯ÊýµÄµÚÒ»¸ö²ÎÊýÊÇdata_home,±íʾÊý¾Ý¼¯ÏÂÔصÄĿ¼,ĬÈÏÊÇ ~/scikit_learn_data/
-
scikit-learnÊý¾Ý¼¯·µ»ØÖµ
-
loadºÍfetch·µ»ØµÄÊý¾ÝÀàÐÍdatasets.base.Bunch(×Öµä¸ñʽ)
- data:ÌØÕ÷Êý¾ÝÊý×é,ÊÇ [n_samples * n_features] µÄ¶þά numpy.ndarray Êý×é
- target:±êÇ©Êý×é,ÊÇ n_samples µÄһά numpy.ndarray Êý×é
- DESCR:Êý¾ÝÃèÊö
- feature_names:ÌØÕ÷Ãû,ÐÂÎÅÊý¾Ý,ÊÖдÊý×Ö¡¢»Ø¹éÊý¾Ý¼¯Ã»ÓÐ
- target_names:±êÇ©Ãû
from sklearn.datasets import load_iris
iris = load_iris()
print("ð°Î²»¨Êý¾Ý¼¯µÄ·µ»ØÖµ:\n", iris)
print("ð°Î²»¨µÄÌØÕ÷Öµ:\n", iris["data"])
print("ð°Î²»¨µÄÄ¿±êÖµ:\n", iris.target)
print("ð°Î²»¨ÌØÕ÷µÄÃû×Ö:\n", iris.feature_names)
print("ð°Î²»¨Ä¿±êÖµµÄÃû×Ö:\n", iris.target_names)
print("ð°Î²»¨µÄÃèÊö:\n", iris.DESCR)
-
²é¿´Êý¾Ý·Ö²¼
ͨ¹ý´´½¨Ò»Ð©Í¼,ÒԲ鿴²»Í¬Àà±ðÊÇÈçºÎͨ¹ýÌØÕ÷À´Çø·ÖµÄ¡£ ÔÚÀíÏëÇé¿öÏÂ,±êÇ©ÀཫÓÉÒ»¸ö»ò¶à¸öÌØÕ÷¶ÔÍêÃÀ·Ö¸ô¡£ ÔÚÏÖʵÊÀ½çÖÐ,ÕâÖÖÀíÏëÇé¿öºÜÉٻᷢÉú¡£
-
seaborn
Seaborn ÊÇ»ùÓÚ Matplotlib ºËÐÄ¿â½øÐÐÁ˸ü¸ß¼¶µÄ API ·â×°,¿ÉÒÔÈÃÄãÇáËɵػ³ö¸üƯÁÁµÄͼÐΡ£²Î¿¼:http://seaborn.pydata.org/
seaborn.lmplot() ÊÇÒ»¸ö·Ç³£ÓÐÓõķ½·¨,Ëü»áÔÚ»æÖƶþάɢµãͼʱ,×Ô¶¯Íê³É»Ø¹éÄâºÏ
- sns.lmplot() ÀïµÄ x, y ·Ö±ð´ú±íºá×Ý×ø±êµÄÁÐÃû,
- data= ÊǹØÁªµ½Êý¾Ý¼¯,
- hue=*´ú±í°´ÕÕ species¼´»¨µÄÀà±ð·ÖÀàÏÔʾ,
- fit_reg=ÊÇ·ñ½øÐÐÏßÐÔÄâºÏ¡£
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
iris_d = pd.DataFrame(iris['data'], columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
iris_d['Species'] = iris.target
def plot_iris(iris, col1, col2):
sns.lmplot(x = col1, y = col2, data = iris, hue = "Species", fit_reg = False)
plt.xlabel(col1)
plt.ylabel(col2)
plt.title('ð°Î²»¨ÖÖÀà·Ö²¼Í¼')
plt.show()
plot_iris(iris_d, 'Petal_Width', 'Sepal_Length')
-
Êý¾Ý¼¯µÄ»®·Ö
-
·ÖÀà
- ѵÁ·¼¯:ÓÃÓÚѵÁ·,¹¹½¨Ä£ÐÍ,70% 80% 75%¡£
- ²âÊÔ¼¯:ÓÃÓÚÆÀ¹ÀÄ£ÐÍÊÇ·ñÓÐЧ,30% 20% 25%¡£
-
api
sklearn.model_selection.train_test_split(arrays, *options)
- x Êý¾Ý¼¯µÄÌØÕ÷Öµ
- y Êý¾Ý¼¯µÄÄ¿±êÖµ
- test_size ²âÊÔ¼¯µÄ´óС,Ò»°ãΪfloat
- random_state Ëæ»úÊýÖÖ×Ó,²»Í¬µÄÖÖ×Ó»áÔì³É²»Í¬µÄËæ»ú²ÉÑù½á¹û¡£ÏàͬµÄÖÖ×Ó²ÉÑù½á¹ûÏàͬ¡£
- return ²âÊÔ¼¯ÌØÕ÷ѵÁ·¼¯ÌØÕ÷ÖµÖµ,ѵÁ·±êÇ©,²âÊÔ±êÇ©(ĬÈÏËæ»úÈ¡)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=22)
print("x_train:\n", x_train.shape)
x_train1, x_test1, y_train1, y_test1 = train_test_split(iris.data, iris.target, random_state=6)
x_train2, x_test2, y_train2, y_test2 = train_test_split(iris.data, iris.target, random_state=6)
print("Èç¹ûËæ»úÊýÖÖ×Ó²»Ò»ÖÂ:\n", x_train == x_train1)
print("Èç¹ûËæ»úÊýÖÖ×ÓÒ»ÖÂ:\n", x_train1 == x_train2)
7 ÌØÕ÷¹¤³Ì
7.1 ʲôÊÇÌØÕ÷Ô¤´¦Àí
-
¶¨Òå
ͨ¹ýһЩת»»º¯Êý½«ÌØÕ÷Êý¾Ýת»»³É¸ü¼ÓÊʺÏË㷨ģÐ͵ÄÌØÕ÷Êý¾Ý¹ý³Ì
-
ΪºÎÒª½øÐйéÒ»»¯/±ê×¼»¯?
ÌØÕ÷µÄµ¥Î»»òÕß´óСÏà²î½Ï´ó,»òÕßijÌØÕ÷µÄ·½²îÏà±ÈÆäËûµÄÌØÕ÷Òª´ó³ö¼¸¸öÊýÁ¿¼¶,ÈÝÒ×Ó°Ïì(Ö§Åä)Ä¿±ê½á¹û,Òò´ËÎÒÃÇÐèÒªÓõ½Ò»Ð©·½·¨½øÐÐÎÞÁ¿¸Ù»¯,ʹ²»Í¬¹æ¸ñµÄÊý¾Ýת»»µ½Í¬Ò»¹æ¸ñ
-
API
sklearn.preprocessing
7.2 ¹éÒ»»¯
ͨ¹ý¶ÔÔʼÊý¾Ý½øÐб任°ÑÊý¾ÝÓ³Éäµ½(ĬÈÏΪ[0,1])Ö®¼ä
×÷ÓÃÓÚÿһÁÐ,maxΪһÁеÄ×î´óÖµ,minΪһÁеÄ×îСֵ,ÄÇôX¡¯¡¯Îª×îÖÕ½á¹û,mx,mi·Ö±ðΪָ¶¨Çø¼äֵĬÈÏmxΪ1,miΪ0
-
API
sklearn.preprocessing.MinMaxScaler (feature_range=(0,1)¡ )
- MinMaxScalar.fit_transform(X)
- X:numpy array¸ñʽµÄÊý¾Ý[n_samples,n_features]
- ·µ»ØÖµ:ת»»ºóµÄÐÎ×´ÏàͬµÄarray
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
def minmax_demo():
"""
¹éÒ»»¯ÑÝʾ
:return: None
"""
data = pd.read_csv("dating.txt")
print(data)
transfer = MinMaxScaler(feature_range=(2, 3))
data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
print("×îСֵ×î´óÖµ¹éÒ»»¯´¦ÀíµÄ½á¹û:\n", data)
return None
-
×ܽá
×¢Òâ×î´óÖµ×îСֵÊDZ仯µÄ,ÁíÍâ,×î´óÖµÓë×îСֵ·Ç³£ÈÝÒ×ÊÜÒì³£µãÓ°Ïì,ËùÒÔÕâÖÖ·½·¨Â³°ôÐԽϲî,Ö»Êʺϴ«Í³¾«È·Ð¡Êý¾Ý³¡¾°¡£
7.3 ±ê×¼»¯
ͨ¹ý¶ÔÔʼÊý¾Ý½øÐб任°ÑÊý¾Ý±ä»»µ½¾ùֵΪ0,±ê×¼²îΪ1·¶Î§ÄÚ
×÷ÓÃÓÚÿһÁÐ,meanΪƽ¾ùÖµ,¦ÒΪ±ê×¼²î
×¢:ÔÚÒÑÓÐÑù±¾×ã¹»¶àµÄÇé¿öϱȽÏÎȶ¨,ÊʺÏÏÖ´úàÐÔÓ´óÊý¾Ý³¡¾°¡£
-
Çø±ð
- ¶ÔÓÚ¹éÒ»»¯À´Ëµ:Èç¹û³öÏÖÒì³£µã,Ó°ÏìÁË×î´óÖµºÍ×îСֵ,ÄÇô½á¹ûÏÔÈ»»á·¢Éú¸Ä±ä
- ¶ÔÓÚ±ê×¼»¯À´Ëµ:Èç¹û³öÏÖÒì³£µã,ÓÉÓÚ¾ßÓÐÒ»¶¨Êý¾ÝÁ¿,ÉÙÁ¿µÄÒì³£µã¶ÔÓÚƽ¾ùÖµµÄÓ°Ïì²¢²»´ó,´Ó¶ø·½²î¸Ä±ä½ÏС¡£
-
API
sklearn.preprocessing.StandardScaler( )
- ´¦ÀíÖ®ºóÿÁÐÀ´ËµËùÓÐÊý¾Ý¶¼¾Û¼¯ÔÚ¾ùÖµ0¸½½ü±ê×¼²î²îΪ1
- StandardScaler.fit_transform(X)
- X:numpy array¸ñʽµÄÊý¾Ý[n_samples,n_features]
- ·µ»ØÖµ:ת»»ºóµÄÐÎ×´ÏàͬµÄarray
import pandas as pd
from sklearn.preprocessing import StandardScaler
def stand_demo():
"""
±ê×¼»¯ÑÝʾ
:return: None
"""
data = pd.read_csv("dating.txt")
print(data)
transfer = StandardScaler()
data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
print("±ê×¼»¯µÄ½á¹û:\n", data)
print("ÿһÁÐÌØÕ÷µÄƽ¾ùÖµ:\n", transfer.mean_)
print("ÿһÁÐÌØÕ÷µÄ·½²î:\n", transfer.var_)
return None
8 ½»²æÑéÖ¤&Íø¸ñËÑË÷
8.1 ʲôÊǽ»²æÑéÖ¤
-
¶¨Òå
½»²æÑéÖ¤:½«Äõ½µÄѵÁ·Êý¾Ý,·ÖΪѵÁ·ºÍÑéÖ¤¼¯¡£
Ä¿µÄ:ΪÁËÈñ»ÆÀ¹ÀµÄÄ£Ð͸ü¼Ó׼ȷ¿ÉÐÅ
- ѵÁ·¼¯:ѵÁ·¼¯+ÑéÖ¤¼¯
- ²âÊÔ¼¯:²âÊÔ¼¯
×¢Òâ:½»²æÑéÖ¤²»ÄÜÌá¸ßÄ£Ð͵Ä׼ȷÂÊ
8.2 ʲôÊÇÍø¸ñËÑË÷(Grid Search)
½»²æÑéÖ¤Ö»ÊǶÔÓÚ²ÎÊýµÃ³ö¸üºÃµÄ½á¹û,ÄÇôÔõôѡÔñ»òÕßµ÷ÓŲÎÊýÄØ?
-
¶¨Òå
ͨ³£Çé¿öÏÂ,Óкܶà²ÎÊýÊÇÐèÒªÊÖ¶¯Ö¸¶¨µÄ(Èçk-½üÁÚËã·¨ÖеÄKÖµ),ÕâÖֽг¬²ÎÊý¡£µ«ÊÇÊÖ¶¯¹ý³Ì·±ÔÓ,ËùÒÔÐèÒª¶ÔÄ£ÐÍÔ¤É輸ÖÖ³¬²ÎÊý×éºÏ¡£Ã¿×鳬²ÎÊý¶¼²ÉÓý»²æÑéÖ¤À´½øÐÐÆÀ¹À¡£×îºóÑ¡³ö×îÓŲÎÊý×éºÏ½¨Á¢Ä£ÐÍ¡£
8.3 API
8.4 °¸ÀýÓÅ»¯
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=22)
transfer = StandardScaler()
fit_transformx_train = transfer.fit_transform(x_train)x_test = transfer.transform(x_test)
estimator = KNeighborsClassifier()
param_dict = {"n_neighbors": [1, 3, 5]}estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3)
estimator.fit(x_train, y_train)
y_predict = estimator.predict(x_test)print("±È¶ÔÔ¤²â½á¹ûºÍÕæʵֵ:\n", y_predict == y_test)
score = estimator.score(x_test, y_test)print("Ö±½Ó¼ÆËã׼ȷÂÊ:\n", score)
print("ÔÚ½»²æÑéÖ¤ÖÐÑéÖ¤µÄ×îºÃ½á¹û:\n", estimator.best_score_)
print("×îºÃµÄ²ÎÊýÄ£ÐÍ:\n", estimator.best_estimator_)
print("ÿ´Î½»²æÑéÖ¤ºóµÄ׼ȷÂʽá¹û:\n", estimator.cv_results_)
|