An Overview: Rental Expenditures in Tehran¶

In this tutorial, we will analyze the residential rental expenses of Tehran residents step-by-step. The goal is to provide an overview of the top features of the hbsir package.

We will get started right away by importing the necessary libraries!

In [1]:

Copied!

import pandas as pd
import hbsir
import pandas as pd
import hbsir

Loading Data¶

In order to kick off our analysis, let's begin by loading the necessary data. While information about rent is available in the "home" table, for a more comprehensive overview, we'll load the "Expenditures" table for the year 1401, which includes all household expenditures.

In [2]:

Copied!

table = hbsir.load_table("Expenditures", 1401)
table.head()
table = hbsir.load_table("Expenditures", 1401)
table.head()

Out[2]:

	Year	ID	Table_Name	Commodity_Code	Provision_Method	Amount	Price	Gross_Expenditure	Net_Expenditure
0	1401	20205408326	food	11615	Purchase	24.0	550000.0	13200000.0	13200000.0
1	1401	20205408326	food	11616	Purchase	36.0	450000.0	16200000.0	16200000.0
2	1401	20205408326	food	11641	Purchase	60.0	120000.0	7200000.0	7200000.0
3	1401	20205408326	food	11642	Purchase	108.0	85000.0	9180000.0	9180000.0
4	1401	20205408326	food	11652	Purchase	12.0	700000.0	8400000.0	8400000.0

A comprehensive description of the structure and content of all tables is available in the documentation.

Adding Household Attributes¶

As we are interested in analyzing urban households, we'll filter the data to focus on this segment.

First, we add the "Urban_Rural" attribute to the table.

In [3]:

Copied!

table = hbsir.add_attribute(table, name="Urban_Rural")
table.head()
table = hbsir.add_attribute(table, name="Urban_Rural")
table.head()

Out[3]:

	Year	ID	Table_Name	Commodity_Code	Provision_Method	Amount	Price	Gross_Expenditure	Net_Expenditure	Urban_Rural
0	1401	20205408326	food	11615	Purchase	24.0	550000.0	13200000.0	13200000.0	Rural
1	1401	20205408326	food	11616	Purchase	36.0	450000.0	16200000.0	16200000.0	Rural
2	1401	20205408326	food	11641	Purchase	60.0	120000.0	7200000.0	7200000.0	Rural
3	1401	20205408326	food	11642	Purchase	108.0	85000.0	9180000.0	9180000.0	Rural
4	1401	20205408326	food	11652	Purchase	12.0	700000.0	8400000.0	8400000.0	Rural

We also need to add province information:

In [4]:

Copied!

table = hbsir.add_attribute(table, name="Province")
table.head()
table = hbsir.add_attribute(table, name="Province")
table.head()

Out[4]:

	Year	ID	Table_Name	Commodity_Code	Provision_Method	Amount	Price	Gross_Expenditure	Net_Expenditure	Urban_Rural	Province
0	1401	20205408326	food	11615	Purchase	24.0	550000.0	13200000.0	13200000.0	Rural	Mazandaran
1	1401	20205408326	food	11616	Purchase	36.0	450000.0	16200000.0	16200000.0	Rural	Mazandaran
2	1401	20205408326	food	11641	Purchase	60.0	120000.0	7200000.0	7200000.0	Rural	Mazandaran
3	1401	20205408326	food	11642	Purchase	108.0	85000.0	9180000.0	9180000.0	Rural	Mazandaran
4	1401	20205408326	food	11652	Purchase	12.0	700000.0	8400000.0	8400000.0	Rural	Mazandaran

Now that we have urban/rural and province data, we can filter the table to only include urban Tehran households:

In [5]:

Copied!

filt = (table["Urban_Rural"] == "Urban") & (table["Province"] == "Tehran")
table = table.loc[filt]
table.head()
filt = (table["Urban_Rural"] == "Urban") & (table["Province"] == "Tehran")
table = table.loc[filt]
table.head()

Out[5]:

	Year	ID	Table_Name	Commodity_Code	Provision_Method	Amount	Price	Gross_Expenditure	Net_Expenditure	Urban_Rural	Province
624297	1401	12301265526	food	11143	Purchase	24.0	65000.0	1560000.0	1560000.0	Urban	Tehran
624298	1401	12301265526	food	11151	Purchase	72.0	55000.0	3960000.0	3960000.0	Urban	Tehran
624299	1401	12301265526	food	11171	Purchase	12.0	600000.0	7200000.0	7200000.0	Urban	Tehran
624300	1401	12301265526	food	11211	Purchase	18.0	1850000.0	33300000.0	33300000.0	Urban	Tehran
624301	1401	12301265526	food	11231	Purchase	72.0	380000.0	27360000.0	27360000.0	Urban	Tehran

Adding Classification¶

In the Iran Household Budget Survey, various codes exist for commodities, occupations, industries, etc. Each code can be decoded in multiple ways. Rather than hardcoding the meanings of these codes, HBSIR provides a function to interpret them flexibly based on the user's needs.

While delving into all the intricacies of this function is an advanced topic, basic usage is straightforward. For example:

In [6]:

Copied!

hbsir.add_classification(table, name="Food_NonFood").head()
hbsir.add_classification(table, name="Food_NonFood").head()

Out[6]:

	Year	ID	Table_Name	Commodity_Code	Provision_Method	Amount	Price	Gross_Expenditure	Net_Expenditure	Urban_Rural	Province	Food_NonFood
624297	1401	12301265526	food	11143	Purchase	24.0	65000.0	1560000.0	1560000.0	Urban	Tehran	Food
624298	1401	12301265526	food	11151	Purchase	72.0	55000.0	3960000.0	3960000.0	Urban	Tehran	Food
624299	1401	12301265526	food	11171	Purchase	12.0	600000.0	7200000.0	7200000.0	Urban	Tehran	Food
624300	1401	12301265526	food	11211	Purchase	18.0	1850000.0	33300000.0	33300000.0	Urban	Tehran	Food
624301	1401	12301265526	food	11231	Purchase	72.0	380000.0	27360000.0	27360000.0	Urban	Tehran	Food

In our case, we need the original Statistical Center of Iran (SCI) COICOP codes. COICOP is available at 4 classification levels in this dataset. We will use level 2 and name the output column "COICOP":

In [7]:

Copied!





table = hbsir.add_classification(
    table,
    name="original",
    levels=2,
    column_names="COICOP",
)
table.head()
table = hbsir.add_classification(
    table,
    name="original",
    levels=2,
    column_names="COICOP",
)
table.head()

Out[7]:

	Year	ID	Table_Name	Commodity_Code	Provision_Method	Amount	Price	Gross_Expenditure	Net_Expenditure	Urban_Rural	Province	COICOP
624297	1401	12301265526	food	11143	Purchase	24.0	65000.0	1560000.0	1560000.0	Urban	Tehran	food
624298	1401	12301265526	food	11151	Purchase	72.0	55000.0	3960000.0	3960000.0	Urban	Tehran	food
624299	1401	12301265526	food	11171	Purchase	12.0	600000.0	7200000.0	7200000.0	Urban	Tehran	food
624300	1401	12301265526	food	11211	Purchase	18.0	1850000.0	33300000.0	33300000.0	Urban	Tehran	food
624301	1401	12301265526	food	11231	Purchase	72.0	380000.0	27360000.0	27360000.0	Urban	Tehran	food

At this level, there is a category representing rent: "actual_rentals_for_housing"

Household Aggregation¶

For any meaningful analysis, we need to incorporate sampling weights to make the data representative. But first, we will aggregate the data by household to calculate total and rental expenditure for each household.

We can perform this aggregation using Pandas' groupby method:

In [8]:

Copied!

total_expenditure = table.groupby(["Year", "ID"])["Gross_Expenditure"].sum()
total_expenditure = total_expenditure.rename("Total_Expenditure")
total_expenditure.head()
total_expenditure = table.groupby(["Year", "ID"])["Gross_Expenditure"].sum()
total_expenditure = total_expenditure.rename("Total_Expenditure")
total_expenditure.head()

Out[8]:

Year  ID         
1401  12301265430    4.411560e+09
      12301265433    1.263400e+09
      12301265436    5.604360e+09
      12301265438    5.972500e+08
      12301265441    8.787720e+09
Name: Total_Expenditure, dtype: float64

This sums the gross expenditure for each unique combination of year and household ID.

Next, we will filter just the "actual_rentals_for_housing" category to calculate rental expenditure specifically:

In [9]:

Copied!





filt = table["COICOP"] == "actual_rentals_for_housing"
rental_expenditure = table.loc[filt].groupby(["Year", "ID"])["Gross_Expenditure"].sum()
rental_expenditure = rental_expenditure.rename("Rental_Expenditure")
rental_expenditure.head()
filt = table["COICOP"] == "actual_rentals_for_housing"
rental_expenditure = table.loc[filt].groupby(["Year", "ID"])["Gross_Expenditure"].sum()
rental_expenditure = rental_expenditure.rename("Rental_Expenditure")
rental_expenditure.head()

Out[9]:

Year  ID         
1401  12301265541    4.200000e+08
      12301265732    1.800000e+09
      12301265834    7.800000e+08
      12301265932    1.200000e+09
      12301265938    2.400000e+09
Name: Rental_Expenditure, dtype: float64

With these two aggregated expenditure series, we can simply concatenate them column-wise. Since the indices are unique, this joins the data without duplication:

In [10]:

Copied!

expenditure_table = pd.concat([total_expenditure, rental_expenditure], axis="columns")
expenditure_table = expenditure_table.fillna(0)
expenditure_table.head()
expenditure_table = pd.concat([total_expenditure, rental_expenditure], axis="columns")
expenditure_table = expenditure_table.fillna(0)
expenditure_table.head()

Out[10]:

		Total_Expenditure	Rental_Expenditure
Year	ID
1401	12301265430	4.411560e+09	0.0
	12301265433	1.263400e+09	0.0
	12301265436	5.604360e+09	0.0
	12301265438	5.972500e+08	0.0
	12301265441	8.787720e+09	0.0

With our aggregated household expenditure data in place, we are now ready to proceed to adding sampling weights.

Adding Sampling Weights¶

Sampling weights for 1396 and onward are available in the "household_information" table. For prior years, the weights are not included in the raw data but HBSIR has integrated them from the SCI summary data. The add_weight function handles these differences behind the scenes and adds the appropriate sampling weights regardless of year.

So we can simply call:

In [11]:

Copied!

expenditure_table = hbsir.add_weight(expenditure_table)
expenditure_table.head()
expenditure_table = hbsir.add_weight(expenditure_table)
expenditure_table.head()

Out[11]:

		Total_Expenditure	Rental_Expenditure	Weight
Year	ID
1401	12301265430	4.411560e+09	0.0	3752.561959
	12301265433	1.263400e+09	0.0	3752.561959
	12301265436	5.604360e+09	0.0	3752.561959
	12301265438	5.972500e+08	0.0	3752.561959
	12301265441	8.787720e+09	0.0	3752.561959

Adding Deciles¶

Finally, to compare rental expenditure across different income levels, we want to categorize households into deciles by income. We can do this using the add_decile function:

In [12]:

Copied!





expenditure_table = hbsir.calculate.add_decile(
    expenditure_table,
    on_variable="Income",
    equivalence_scale="OECD_Modified",
    groupby=["Urban_Rural", "Province"],
)
expenditure_table.head()
expenditure_table = hbsir.calculate.add_decile(
    expenditure_table,
    on_variable="Income",
    equivalence_scale="OECD_Modified",
    groupby=["Urban_Rural", "Province"],
)
expenditure_table.head()

C:\Users\Amin\Work\Packages\HBSIR\.venv\Lib\site-packages\bssir\calculator\quantile.py:144: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  .apply(self._calculate_subgroup_quantile)

Out[12]:

		Total_Expenditure	Rental_Expenditure	Weight	Decile
Year	ID
1401	12301265430	4.411560e+09	0.0	3752.561959	10
	12301265433	1.263400e+09	0.0	3752.561959	3
	12301265436	5.604360e+09	0.0	3752.561959	10
	12301265438	5.972500e+08	0.0	3752.561959	6
	12301265441	8.787720e+09	0.0	3752.561959	10

The groupby parameter allows deciling within segments, rather than across the full population. Here we group by urban/rural status and province, meaning income deciles will be defined separately for each combination of these groups. This allows more relevant comparison of households.

But what is this "equivalence_scale" parameter? By default value, "Household", deciling occurs on the absolute income value regardless of household size. With "Per_Capita", the income is divided by the number of members first. "OECD", "OECD_Modified", and "Square_Root" are in-between options accounting for economies of scale in larger households.

For more detailed information on equivalence scales, see the OECD Note on Equivalence Scales.

With all necessary columns added, we can now proceed to the analysis!

Calculate Weighted Average¶

Finally, we want to analyze the data by calculating the weighted average expenditure (total and rental) for each income decile. HBSIR provides a convenient function for this common task:

In [13]:

Copied!





average_expenditure = hbsir.calculate.average_table(
    expenditure_table,
    groupby=["Decile"],
    columns=["Total_Expenditure", "Rental_Expenditure"],
)
average_expenditure
average_expenditure = hbsir.calculate.average_table(
    expenditure_table,
    groupby=["Decile"],
    columns=["Total_Expenditure", "Rental_Expenditure"],
)
average_expenditure

C:\Users\Amin\Work\Packages\HBSIR\.venv\Lib\site-packages\bssir\calculator\average.py:106: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  result = table.groupby(groupby).apply(

Out[13]:

	Total_Expenditure	Rental_Expenditure
Decile
1	9.534972e+08	1.096976e+08
2	1.150881e+09	8.947766e+07
3	1.376849e+09	1.021392e+08
4	1.425761e+09	7.931513e+07
5	1.657409e+09	7.098628e+07
6	1.970473e+09	1.195037e+08
7	1.960909e+09	7.530307e+07
8	2.413251e+09	1.335339e+08
9	2.946481e+09	1.288825e+08
10	3.843349e+09	1.080960e+08

With the weighted averages, we can easily calculate rental expenditure as a percentage of total expenditure for each decile:

In [14]:

Copied!





ratio = (
    average_expenditure["Rental_Expenditure"]
    / average_expenditure["Total_Expenditure"]
    * 100
)
ratio
ratio = (
    average_expenditure["Rental_Expenditure"]
    / average_expenditure["Total_Expenditure"]
    * 100
)
ratio

Out[14]:

Decile
1     11.504765
2      7.774710
3      7.418333
4      5.563004
5      4.282966
6      6.064720
7      3.840211
8      5.533363
9      4.374115
10     2.812547
dtype: float64

Finally, we can visualize the results in a bar chart:

In [15]:

Copied!

ratio.plot.bar()
ratio.plot.bar()

Out[15]:

<Axes: xlabel='Decile'>

No description has been provided for this image

And with that, we have explored the key steps for analyzing rental expenditure data with HBSIR! This concludes our introductory tutorial walking through the main functions of the package.