An Overview: Rental Expenditures in Tehran¶
In this tutorial, we will analyze the residential rental expenses of Tehran residents step-by-step. The goal is to provide an overview of the top features of the hbsir package.
We will get started right away by importing the necessary libraries!
import pandas as pd
import hbsir
Loading Data¶
In order to kick off our analysis, let's begin by loading the necessary data. While information about rent is available in the "home" table, for a more comprehensive overview, we'll load the "Expenditures" table for the year 1401, which includes all household expenditures.
table = hbsir.load_table("Expenditures", 1401)
table.head()
Year | ID | Table_Name | Commodity_Code | Provision_Method | Amount | Price | Gross_Expenditure | Net_Expenditure | |
---|---|---|---|---|---|---|---|---|---|
0 | 1401 | 20205408326 | food | 11615 | Purchase | 2.0 | 550000.0 | 13200000.0 | 13200000.0 |
1 | 1401 | 20205408326 | food | 11616 | Purchase | 3.0 | 450000.0 | 16200000.0 | 16200000.0 |
2 | 1401 | 20205408326 | food | 11641 | Purchase | 5.0 | 120000.0 | 7200000.0 | 7200000.0 |
3 | 1401 | 20205408326 | food | 11642 | Purchase | 9.0 | 85000.0 | 9180000.0 | 9180000.0 |
4 | 1401 | 20205408326 | food | 11652 | Purchase | 1.0 | 700000.0 | 8400000.0 | 8400000.0 |
A comprehensive description of the structure and content of all tables is available in the documentation.
Adding Household Attributes¶
As we are interested in analyzing urban households, we'll filter the data to focus on this segment.
First, we add the "Urban_Rural" attribute to the table.
table = hbsir.add_attribute(table, name="Urban_Rural")
table.head()
Year | ID | Table_Name | Commodity_Code | Provision_Method | Amount | Price | Gross_Expenditure | Net_Expenditure | Urban_Rural | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1401 | 20205408326 | food | 11615 | Purchase | 2.0 | 550000.0 | 13200000.0 | 13200000.0 | Rural |
1 | 1401 | 20205408326 | food | 11616 | Purchase | 3.0 | 450000.0 | 16200000.0 | 16200000.0 | Rural |
2 | 1401 | 20205408326 | food | 11641 | Purchase | 5.0 | 120000.0 | 7200000.0 | 7200000.0 | Rural |
3 | 1401 | 20205408326 | food | 11642 | Purchase | 9.0 | 85000.0 | 9180000.0 | 9180000.0 | Rural |
4 | 1401 | 20205408326 | food | 11652 | Purchase | 1.0 | 700000.0 | 8400000.0 | 8400000.0 | Rural |
We also need to add province information:
table = hbsir.add_attribute(table, name="Province")
table.head()
Year | ID | Table_Name | Commodity_Code | Provision_Method | Amount | Price | Gross_Expenditure | Net_Expenditure | Urban_Rural | Province | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1401 | 20205408326 | food | 11615 | Purchase | 2.0 | 550000.0 | 13200000.0 | 13200000.0 | Rural | Mazandaran |
1 | 1401 | 20205408326 | food | 11616 | Purchase | 3.0 | 450000.0 | 16200000.0 | 16200000.0 | Rural | Mazandaran |
2 | 1401 | 20205408326 | food | 11641 | Purchase | 5.0 | 120000.0 | 7200000.0 | 7200000.0 | Rural | Mazandaran |
3 | 1401 | 20205408326 | food | 11642 | Purchase | 9.0 | 85000.0 | 9180000.0 | 9180000.0 | Rural | Mazandaran |
4 | 1401 | 20205408326 | food | 11652 | Purchase | 1.0 | 700000.0 | 8400000.0 | 8400000.0 | Rural | Mazandaran |
Now that we have urban/rural and province data, we can filter the table to only include urban Tehran households:
filt = (table["Urban_Rural"] == "Urban") & (table["Province"] == "Tehran")
table = table.loc[filt]
table.head()
Year | ID | Table_Name | Commodity_Code | Provision_Method | Amount | Price | Gross_Expenditure | Net_Expenditure | Urban_Rural | Province | |
---|---|---|---|---|---|---|---|---|---|---|---|
624297 | 1401 | 12301265526 | food | 11143 | Purchase | 2.0 | 65000.0 | 1560000.0 | 1560000.0 | Urban | Tehran |
624298 | 1401 | 12301265526 | food | 11151 | Purchase | 6.0 | 55000.0 | 3960000.0 | 3960000.0 | Urban | Tehran |
624299 | 1401 | 12301265526 | food | 11171 | Purchase | 1.0 | 600000.0 | 7200000.0 | 7200000.0 | Urban | Tehran |
624300 | 1401 | 12301265526 | food | 11211 | Purchase | 1.5 | 1850000.0 | 33300000.0 | 33300000.0 | Urban | Tehran |
624301 | 1401 | 12301265526 | food | 11231 | Purchase | 6.0 | 380000.0 | 27360000.0 | 27360000.0 | Urban | Tehran |
Adding Classification¶
In the Iran Household Budget Survey, various codes exist for commodities, occupations, industries, etc. Each code can be decoded in multiple ways. Rather than hardcoding the meanings of these codes, HBSIR provides a function to interpret them flexibly based on the user's needs.
While delving into all the intricacies of this function is an advanced topic, basic usage is straightforward. For example:
hbsir.add_classification(table, name="Food_NonFood").head()
Year | ID | Table_Name | Commodity_Code | Provision_Method | Amount | Price | Gross_Expenditure | Net_Expenditure | Urban_Rural | Province | Food_NonFood | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
624297 | 1401 | 12301265526 | food | 11143 | Purchase | 2.0 | 65000.0 | 1560000.0 | 1560000.0 | Urban | Tehran | Food |
624298 | 1401 | 12301265526 | food | 11151 | Purchase | 6.0 | 55000.0 | 3960000.0 | 3960000.0 | Urban | Tehran | Food |
624299 | 1401 | 12301265526 | food | 11171 | Purchase | 1.0 | 600000.0 | 7200000.0 | 7200000.0 | Urban | Tehran | Food |
624300 | 1401 | 12301265526 | food | 11211 | Purchase | 1.5 | 1850000.0 | 33300000.0 | 33300000.0 | Urban | Tehran | Food |
624301 | 1401 | 12301265526 | food | 11231 | Purchase | 6.0 | 380000.0 | 27360000.0 | 27360000.0 | Urban | Tehran | Food |
In our case, we need the original Statistical Center of Iran (SCI) COICOP codes. COICOP is available at 4 classification levels in this dataset. We will use level 2 and name the output column "COICOP":
table = hbsir.add_classification(
table,
name="original",
levels=2,
column_names="COICOP",
)
table.head()
Year | ID | Table_Name | Commodity_Code | Provision_Method | Amount | Price | Gross_Expenditure | Net_Expenditure | Urban_Rural | Province | COICOP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
624297 | 1401 | 12301265526 | food | 11143 | Purchase | 2.0 | 65000.0 | 1560000.0 | 1560000.0 | Urban | Tehran | food |
624298 | 1401 | 12301265526 | food | 11151 | Purchase | 6.0 | 55000.0 | 3960000.0 | 3960000.0 | Urban | Tehran | food |
624299 | 1401 | 12301265526 | food | 11171 | Purchase | 1.0 | 600000.0 | 7200000.0 | 7200000.0 | Urban | Tehran | food |
624300 | 1401 | 12301265526 | food | 11211 | Purchase | 1.5 | 1850000.0 | 33300000.0 | 33300000.0 | Urban | Tehran | food |
624301 | 1401 | 12301265526 | food | 11231 | Purchase | 6.0 | 380000.0 | 27360000.0 | 27360000.0 | Urban | Tehran | food |
At this level, there is a category representing rent: "actual_rentals_for_housing"
Household Aggregation¶
For any meaningful analysis, we need to incorporate sampling weights to make the data representative. But first, we will aggregate the data by household to calculate total and rental expenditure for each household.
We can perform this aggregation using Pandas' groupby method:
total_expenditure = table.groupby(["Year", "ID"])["Gross_Expenditure"].sum()
total_expenditure = total_expenditure.rename("Total_Expenditure")
total_expenditure.head()
Year ID 1401 12301265430 4.411560e+09 12301265433 1.263400e+09 12301265436 5.604360e+09 12301265438 5.972500e+08 12301265441 8.787720e+09 Name: Total_Expenditure, dtype: float64
This sums the gross expenditure for each unique combination of year and household ID.
Next, we will filter just the "actual_rentals_for_housing" category to calculate rental expenditure specifically:
filt = table["COICOP"] == "actual_rentals_for_housing"
rental_expenditure = table.loc[filt].groupby(["Year", "ID"])["Gross_Expenditure"].sum()
rental_expenditure = rental_expenditure.rename("Rental_Expenditure")
rental_expenditure.head()
Year ID 1401 12301265541 4.200000e+08 12301265732 1.800000e+09 12301265834 7.800000e+08 12301265932 1.200000e+09 12301265938 2.400000e+09 Name: Rental_Expenditure, dtype: float64
With these two aggregated expenditure series, we can simply concatenate them column-wise. Since the indices are unique, this joins the data without duplication:
expenditure_table = pd.concat([total_expenditure, rental_expenditure], axis="columns")
expenditure_table = expenditure_table.fillna(0)
expenditure_table.head()
Total_Expenditure | Rental_Expenditure | ||
---|---|---|---|
Year | ID | ||
1401 | 12301265430 | 4.411560e+09 | 0.0 |
12301265433 | 1.263400e+09 | 0.0 | |
12301265436 | 5.604360e+09 | 0.0 | |
12301265438 | 5.972500e+08 | 0.0 | |
12301265441 | 8.787720e+09 | 0.0 |
With our aggregated household expenditure data in place, we are now ready to proceed to adding sampling weights.
Adding Sampling Weights¶
Sampling weights for 1396 and onward are available in the "household_information" table. For prior years, the weights are not included in the raw data but HBSIR has integrated them from the SCI summary data. The add_weight function handles these differences behind the scenes and adds the appropriate sampling weights regardless of year.
So we can simply call:
expenditure_table = hbsir.add_weight(expenditure_table)
expenditure_table.head()
Total_Expenditure | Rental_Expenditure | Weight | ||
---|---|---|---|---|
Year | ID | |||
1401 | 12301265430 | 4.411560e+09 | 0.0 | 3752.561959 |
12301265433 | 1.263400e+09 | 0.0 | 3752.561959 | |
12301265436 | 5.604360e+09 | 0.0 | 3752.561959 | |
12301265438 | 5.972500e+08 | 0.0 | 3752.561959 | |
12301265441 | 8.787720e+09 | 0.0 | 3752.561959 |
Adding Deciles¶
Finally, to compare rental expenditure across different income levels, we want to categorize households into deciles by income. We can do this using the add_decile function:
expenditure_table = hbsir.calculate.add_decile(
expenditure_table,
on_variable="Income",
equivalence_scale="OECD_Modified",
groupby=["Urban_Rural", "Province"],
)
expenditure_table.head()
Total_Expenditure | Rental_Expenditure | Weight | Decile | ||
---|---|---|---|---|---|
Year | ID | ||||
1401 | 12301265430 | 4.411560e+09 | 0.0 | 3752.561959 | 10 |
12301265433 | 1.263400e+09 | 0.0 | 3752.561959 | 3 | |
12301265436 | 5.604360e+09 | 0.0 | 3752.561959 | 10 | |
12301265438 | 5.972500e+08 | 0.0 | 3752.561959 | 6 | |
12301265441 | 8.787720e+09 | 0.0 | 3752.561959 | 10 |
The groupby parameter allows deciling within segments, rather than across the full population. Here we group by urban/rural status and province, meaning income deciles will be defined separately for each combination of these groups. This allows more relevant comparison of households.
But what is this "equivalence_scale" parameter? By default value, "Household", deciling occurs on the absolute income value regardless of household size. With "Per_Capita", the income is divided by the number of members first. "OECD", "OECD_Modified", and "Square_Root" are in-between options accounting for economies of scale in larger households.
For more detailed information on equivalence scales, see the OECD Note on Equivalence Scales.
With all necessary columns added, we can now proceed to the analysis!
Calculate Weighted Average¶
Finally, we want to analyze the data by calculating the weighted average expenditure (total and rental) for each income decile. HBSIR provides a convenient function for this common task:
average_expenditure = hbsir.calculate.average_table(
expenditure_table,
groupby=["Decile"],
columns=["Total_Expenditure", "Rental_Expenditure"]
)
average_expenditure
Total_Expenditure | Rental_Expenditure | |
---|---|---|
Decile | ||
1 | 9.534972e+08 | 1.096976e+08 |
2 | 1.150881e+09 | 8.947766e+07 |
3 | 1.376849e+09 | 1.021392e+08 |
4 | 1.425761e+09 | 7.931513e+07 |
5 | 1.657409e+09 | 7.098628e+07 |
6 | 1.970473e+09 | 1.195037e+08 |
7 | 1.960909e+09 | 7.530307e+07 |
8 | 2.413251e+09 | 1.335339e+08 |
9 | 2.946481e+09 | 1.288825e+08 |
10 | 3.843349e+09 | 1.080960e+08 |
With the weighted averages, we can easily calculate rental expenditure as a percentage of total expenditure for each decile:
ratio = (
average_expenditure["Rental_Expenditure"]
/ average_expenditure["Total_Expenditure"]
* 100
)
ratio
Decile 1 11.504765 2 7.774710 3 7.418333 4 5.563004 5 4.282966 6 6.064720 7 3.840211 8 5.533363 9 4.374115 10 2.812547 dtype: float64
Finally, we can visualize the results in a bar chart:
ratio.plot.bar()
<Axes: xlabel='Decile'>
And with that, we have explored the key steps for analyzing rental expenditure data with HBSIR! This concludes our introductory tutorial walking through the main functions of the package.