外包 | “Pandas” Assignments 20220404

文章目录

外包 | "Pandas" Assignments 20220404
0. 相关文件
1. Exercise W3.01
2. Exercise W3.02
3. Exercise W3.03

0. 相关文件

1. Exercise W3.01

The file w3_Returns.csv contains data on the Dec 2021 return of US companies togeter with a numerical sector code. Load its content into a DataFrame called returns. Choose the appropriate index for this DataFrame (Point 1: 1 Mark).

# csv文件中有缺失值，dropna()丢弃有缺失值的那一行
returns=pd.read_csv('w3_Returns.csv').dropna(axis=0,how='any')
returns=returns.set_index('Ticker')
returns.head()

"""
			Sector	Return
Ticker		
	A		33.0	0.057985
	AA		33.0	0.280464
	AAA		52.0	0.000787
	AAAU	52.0	0.032368
	AAC		52.0	-0.005107
"""

The file w3_Markets.csv contains the name of the stock exchange where each company is listed. Load its content into a DataFrame called markets. Choose the appropriate index for this DataFrame.

markets=pd.read_csv('w3_Markets.csv').dropna(axis=0,how='any')
markets=markets.set_index('Exchange')
markets.head()

"""
			Ticker
Exchange	
NYSE		A
NYSE		AA
ARCA		AAA
ARCA		AAAU
NYSE		AAC
"""

Produce a DataFrame called count having as index the names of the stock markets and as a single column the number of companies listed in each market (Point 2: 1 Mark).

# 根据markets这个dataframe的index即Exchange进行groupby(), 然后.count()统计
count=markets.groupby(markets.index).count()
count

"""
			Ticker
Exchange	
AMEX		254
ARCA		1735
NASDAQ		4155
NYSE		2654
"""

Calculate the average return of all the companies listed on the NYSE. Assign this result to a numeric variable (a number) called average_nyse (Point 3: 1 Mark).

# 在markets中找出在NYSE交易的票, 得到一个list
tck_in_nyse=markets['Ticker']['NYSE'].to_list()
# 在returns中导出所有有记录的票, 得到一个list
tck_in_returns=returns.index.to_list()
# 对两个list取交集
tck_in_common=list(set(tck_in_nyse) & set(tck_in_returns))

# 根据交集中的票在returns中找到对应的回报率进行求和
sum_nyse=returns['Return'][tck_in_common].sum()
# 得到NYSE的数量
num_nyse=count['Ticker']['NYSE']

average_nyse=sum_nyse/num_nyse
average_nyse

"""
0.0249109779279284
"""

2. Exercise W3.02

The file w3_Returns.csv contains data on the Dec 2021 return of US companies, as well as the code of the sector where the companies operate. Load its content into a DataFrame called new_returns. Choose the appropriate index for this DataFrame.

new_returns=pd.read_csv('w3_Returns.csv').dropna(axis=0,how='any')
new_returns=new_returns.set_index('Sector')
new_returns.head()

"""
		Ticker		Return
Sector		
33.0		A		0.057985
33.0		AA		0.280464
52.0		AAA		0.000787
52.0		AAAU	0.032368
52.0		AAC		-0.005107
"""

The file w3_Naics.csv contains the name of the industry associated with each sector code. Load its content into a DataFrame called naics. Choose the appropriate index for this DataFrame (Point 4: 1 Mark).

naics=pd.read_csv('w3_Naics.csv').dropna(axis=0,how='any')
naics=naics.set_index('Sector')
naics.head()

"""

		Industry
Sector	
11		Agriculture, Forestry, Fishing and Hunting
21		Mining, Quarrying, and Oil and Gas Extraction
22		Utilities
23		Construction
31		Manufacturing
"""

Combine new_returns and naics into a single DataFrame called merged that should have:

The Ticker of the companies as index
Only two columns, one with the returns and one with the industry (the name not the numeric code)

You may need more than one command to achieve the final result, we will not consider the intermediate results but only the final form of merged (Point 5: 1 Mark).

# pd.merge()合并两个dataframe, how参数分"inner"和"outer", 可以简单理解为交集和并集, 这里根据的是Sector索引
merged=pd.merge(new_returns,naics,how='inner',on='Sector')
merged=merged.set_index('Ticker')
merged

"""
		Return		Industry
Ticker		
A		0.057985	Manufacturing
AA		0.280464	Manufacturing
AAOI	-0.130288	Manufacturing
AAON	0.018333	Manufacturing
AAPL	0.074229	Manufacturing
"""

Plot the average return of the stocks in each industry in a (vertical) bar plot with the following caracteristics (Point 6: 1 Mark):

Size of 12x6
Title “Average Return by Industry”
The names of the industries on the horizontal axis
The vertical bars should be red
A visible grid

avg_R_by_I=merged.groupby(merged.Industry).mean()
avg_R_by_I

"""
																			Return
Industry	
Accommodation and Food Services												0.062288
Administrative and Support and Waste Management and Remediation Services	-0.017815
Agriculture, Forestry, Fishing and Hunting									0.000734
Arts, Entertainment, and Recreation											0.000078
Construction																0.009472
Educational Services														-0.063471
Finance and Insurance														0.014224
Health Care and Social Assistance											-0.039070
Information																	-0.026538
Management of Companies and Enterprises										-0.026664
Manufacturing																-0.015300
Mining, Quarrying, and Oil and Gas Extraction								-0.011587
Other Services (except Public Administration)								-0.063978
Professional, Scientific, and Technical Services							-0.053912
Public Administration														-0.104600
Real Estate and Rental and Leasing											0.060076
Retail Trade																-0.043317
Transportation and Warehousing												0.025556
Utilities																	0.054209
Wholesale Trade																-0.008142
"""

avg_R_by_I.plot(kind='bar',figsize=(12,6),title='Average Return by Industry',color='red',grid=True)

在这里插入图片描述

3. Exercise W3.03

The file w3_months.xlsx contains monthly returns for the ASX 200 and the S&P 500. Load its content in a DataFrame called monthly_returns and choose the appropriate index for it. You should load directly from the excel file without exporting from Excel to CSV first (Point 7: 1 Mark).

monthly_returns=pd.read_excel('w3_months.xlsx').dropna(axis=0,how='any')
monthly_returns.head()

"""
	Date		ASX_200		SP_500		Month
0	2011-09-30	-6.132068	-7.029882	9
1	2011-10-31	7.246025	10.929312	10
2	2011-11-30	-3.478293	-0.220997	11
3	2011-12-30	-1.352664	1.022909	12
4	2012-01-31	5.083386	4.481542	1
"""

Calculate the number of months in which the US market has a higher return than the Australian market. Using a logical condition (not an IF statement) generate a boolean variable (True or False) called US_best that is True if the US beats Australia in more than 50% of the months and False if it does not (Point 8: 1 Mark).

mr_cond1=monthly_returns.loc[monthly_returns['SP_500']>monthly_returns['ASX_200']]
US_best=len(mr_cond1)>0.5*len(monthly_returns)
US_best

"""
True
"""

Calculate the percentage of monthly observations when the US beats the Australian market considering only returns from the first quarter of each year (Jan, Feb and March). Assign the result (as a number) to a variable called first_best (Point 9: 1 Mark).

mr_cond2=monthly_returns[(monthly_returns.Month==1)|(monthly_returns.Month==2)|(monthly_returns.Month==3)]
mr_cond3=mr_cond1[(mr_cond1.Month==1)|(mr_cond1.Month==2)|(mr_cond1.Month==3)]
first_best=len(mr_cond3)/len(mr_cond2)
first_best

"""
0.5666666666666667
"""

Generate a series called volatility having, as the index, the name of the two stock indexes and as a single column the avarage monthly standard deviation of the two markets. This series should contain only two numbers (Point 10: 1 mark).

std_data=[monthly_returns['ASX_200'].std(),monthly_returns['SP_500'].std()]
volatility=pd.Series(data=std_data,index=['ASX_200','SP_500'])
volatility

"""
ASX_200    3.906967
SP_500     3.879609
dtype: float64
"""