外包 | “Pandas” Assignments 20220404
0. 相关文件
相关文件下载
1. Exercise W3.01
The file w3_Returns.csv contains data on the Dec 2021 return of US companies togeter with a numerical sector code. Load its content into a DataFrame called returns . Choose the appropriate index for this DataFrame (Point 1: 1 Mark).
returns=pd.read_csv('w3_Returns.csv').dropna(axis=0,how='any')
returns=returns.set_index('Ticker')
returns.head()
"""
Sector Return
Ticker
A 33.0 0.057985
AA 33.0 0.280464
AAA 52.0 0.000787
AAAU 52.0 0.032368
AAC 52.0 -0.005107
"""
The file w3_Markets.csv contains the name of the stock exchange where each company is listed. Load its content into a DataFrame called markets . Choose the appropriate index for this DataFrame.
markets=pd.read_csv('w3_Markets.csv').dropna(axis=0,how='any')
markets=markets.set_index('Exchange')
markets.head()
"""
Ticker
Exchange
NYSE A
NYSE AA
ARCA AAA
ARCA AAAU
NYSE AAC
"""
Produce a DataFrame called count having as index the names of the stock markets and as a single column the number of companies listed in each market (Point 2: 1 Mark).
count=markets.groupby(markets.index).count()
count
"""
Ticker
Exchange
AMEX 254
ARCA 1735
NASDAQ 4155
NYSE 2654
"""
Calculate the average return of all the companies listed on the NYSE. Assign this result to a numeric variable (a number) called average_nyse (Point 3: 1 Mark).
tck_in_nyse=markets['Ticker']['NYSE'].to_list()
tck_in_returns=returns.index.to_list()
tck_in_common=list(set(tck_in_nyse) & set(tck_in_returns))
sum_nyse=returns['Return'][tck_in_common].sum()
num_nyse=count['Ticker']['NYSE']
average_nyse=sum_nyse/num_nyse
average_nyse
"""
0.0249109779279284
"""
2. Exercise W3.02
The file w3_Returns.csv contains data on the Dec 2021 return of US companies, as well as the code of the sector where the companies operate. Load its content into a DataFrame called new_returns . Choose the appropriate index for this DataFrame.
new_returns=pd.read_csv('w3_Returns.csv').dropna(axis=0,how='any')
new_returns=new_returns.set_index('Sector')
new_returns.head()
"""
Ticker Return
Sector
33.0 A 0.057985
33.0 AA 0.280464
52.0 AAA 0.000787
52.0 AAAU 0.032368
52.0 AAC -0.005107
"""
The file w3_Naics.csv contains the name of the industry associated with each sector code. Load its content into a DataFrame called naics . Choose the appropriate index for this DataFrame (Point 4: 1 Mark).
naics=pd.read_csv('w3_Naics.csv').dropna(axis=0,how='any')
naics=naics.set_index('Sector')
naics.head()
"""
Industry
Sector
11 Agriculture, Forestry, Fishing and Hunting
21 Mining, Quarrying, and Oil and Gas Extraction
22 Utilities
23 Construction
31 Manufacturing
"""
Combine new_returns and naics into a single DataFrame called merged that should have:
- The Ticker of the companies as index
- Only two columns, one with the returns and one with the industry (the name not the numeric code)
You may need more than one command to achieve the final result, we will not consider the intermediate results but only the final form of merged (Point 5: 1 Mark).
merged=pd.merge(new_returns,naics,how='inner',on='Sector')
merged=merged.set_index('Ticker')
merged
"""
Return Industry
Ticker
A 0.057985 Manufacturing
AA 0.280464 Manufacturing
AAOI -0.130288 Manufacturing
AAON 0.018333 Manufacturing
AAPL 0.074229 Manufacturing
"""
Plot the average return of the stocks in each industry in a (vertical) bar plot with the following caracteristics (Point 6: 1 Mark):
- Size of 12x6
- Title “Average Return by Industry”
- The names of the industries on the horizontal axis
- The vertical bars should be red
- A visible grid
avg_R_by_I=merged.groupby(merged.Industry).mean()
avg_R_by_I
"""
Return
Industry
Accommodation and Food Services 0.062288
Administrative and Support and Waste Management and Remediation Services -0.017815
Agriculture, Forestry, Fishing and Hunting 0.000734
Arts, Entertainment, and Recreation 0.000078
Construction 0.009472
Educational Services -0.063471
Finance and Insurance 0.014224
Health Care and Social Assistance -0.039070
Information -0.026538
Management of Companies and Enterprises -0.026664
Manufacturing -0.015300
Mining, Quarrying, and Oil and Gas Extraction -0.011587
Other Services (except Public Administration) -0.063978
Professional, Scientific, and Technical Services -0.053912
Public Administration -0.104600
Real Estate and Rental and Leasing 0.060076
Retail Trade -0.043317
Transportation and Warehousing 0.025556
Utilities 0.054209
Wholesale Trade -0.008142
"""
avg_R_by_I.plot(kind='bar',figsize=(12,6),title='Average Return by Industry',color='red',grid=True)
3. Exercise W3.03
The file w3_months.xlsx contains monthly returns for the ASX 200 and the S&P 500. Load its content in a DataFrame called monthly_returns and choose the appropriate index for it. You should load directly from the excel file without exporting from Excel to CSV first (Point 7: 1 Mark).
monthly_returns=pd.read_excel('w3_months.xlsx').dropna(axis=0,how='any')
monthly_returns.head()
"""
Date ASX_200 SP_500 Month
0 2011-09-30 -6.132068 -7.029882 9
1 2011-10-31 7.246025 10.929312 10
2 2011-11-30 -3.478293 -0.220997 11
3 2011-12-30 -1.352664 1.022909 12
4 2012-01-31 5.083386 4.481542 1
"""
Calculate the number of months in which the US market has a higher return than the Australian market. Using a logical condition (not an IF statement) generate a boolean variable (True or False ) called US_best that is True if the US beats Australia in more than 50% of the months and False if it does not (Point 8: 1 Mark).
mr_cond1=monthly_returns.loc[monthly_returns['SP_500']>monthly_returns['ASX_200']]
US_best=len(mr_cond1)>0.5*len(monthly_returns)
US_best
"""
True
"""
Calculate the percentage of monthly observations when the US beats the Australian market considering only returns from the first quarter of each year (Jan, Feb and March). Assign the result (as a number) to a variable called first_best (Point 9: 1 Mark).
mr_cond2=monthly_returns[(monthly_returns.Month==1)|(monthly_returns.Month==2)|(monthly_returns.Month==3)]
mr_cond3=mr_cond1[(mr_cond1.Month==1)|(mr_cond1.Month==2)|(mr_cond1.Month==3)]
first_best=len(mr_cond3)/len(mr_cond2)
first_best
"""
0.5666666666666667
"""
Generate a series called volatility having, as the index, the name of the two stock indexes and as a single column the avarage monthly standard deviation of the two markets. This series should contain only two numbers (Point 10: 1 mark).
std_data=[monthly_returns['ASX_200'].std(),monthly_returns['SP_500'].std()]
volatility=pd.Series(data=std_data,index=['ASX_200','SP_500'])
volatility
"""
ASX_200 3.906967
SP_500 3.879609
dtype: float64
"""
|