Getting Started¶

The LFSIR package provides easy access to Iran's Labor Force Survey data. We can load the survey tables and enrich them with additional informations.

In [1]:

Copied!

import lfsir
import lfsir

Loading Survey Data¶

The load_table function loads survey data for a given year. Let's load the data from 1401.

In [2]:

Copied!

df = lfsir.load_table(years=1401)
df.head()
df = lfsir.load_table(years=1401)
df.head()

Out[2]:

	ID	Census_Turn	Alternative_Household	Member_Number	Relationship	Sex	Birth_Month	Birth_Year	Age	Nationality	...	Ready_To_Start_Job_If_Found_Last_Week_2	Reason_Not_Ready_For_Work	Situation_Last_Week	Preferred_Hours_Per_Day	Preferred_Days_Per_Week	Preferred_Employment_Status	Preferred_Work_Sector	Weight	Activity_State	Year
0	110105161	3	False	2	Spouse	Female	<NA>	49	52	Iranian	...	<NA>	NaN	Student	<NA>	<NA>	NaN	NaN	274.358093	Inactive	1401
1	110105161	3	False	3	Child	Female	<NA>	77	24	Iranian	...	<NA>	NaN	Employed	<NA>	<NA>	NaN	NaN	274.358093	Inactive	1401
2	110105162	2	False	1	Head	Male	3	50	50	Iranian	...	<NA>	NaN	NaN	<NA>	<NA>	NaN	NaN	288.560120	Employed	1401
3	110105162	2	False	2	Spouse	Female	6	56	44	Iranian	...	<NA>	NaN	Student	<NA>	<NA>	NaN	NaN	288.560120	Inactive	1401
4	110105162	2	False	3	Child	Male	2	77	24	Iranian	...	<NA>	NaN	Employed	<NA>	<NA>	NaN	NaN	288.560120	Inactive	1401

5 rows × 99 columns

We now have the raw survey data for 1401.

Next we can enrich this by adding attributes about each household.

Adding Attributes¶

The add_attribute function can append additional attribute columns based on a household ID.

This allows easy segmentation and analysis by attributes like province and urban/rural status.

In [3]:

Copied!

df = lfsir.add_attribute(df, "Province")
df = lfsir.add_attribute(df, "Province")

We added the province name for each household.

In [4]:

Copied!

df = lfsir.add_attribute(df, "Urban_Rural")
df = lfsir.add_attribute(df, "Urban_Rural")

Now we also have the urban/rural status.

Let's confirm that worked by peeking at the data.

In [5]:

Copied!

df[["ID", "Province", "Urban_Rural"]].sample(20)
df[["ID", "Province", "Urban_Rural"]].sample(20)

Out[5]:

	ID	Province	Urban_Rural
306048	113102066	Hamadan	Urban
369154	116103873	Ilam	Urban
527914	123119058	Tehran	Urban
663295	130108750	Alborz	Urban
629745	129100266	South_Khorasan	Urban
172061	107110472	Fars	Urban
30129	101106368	Gilan	Urban
5486	100104358	Markazi	Urban
654920	130100659	Alborz	Urban
485520	122102263	Hormozgan	Urban
629699	129100260	South_Khorasan	Urban
74420	103111254	East_Azerbaijan	Urban
34036	101209758	Gilan	Rural
445900	120102167	Semnan	Urban
185939	108100661	Kerman	Urban
372721	116106558	Ilam	Urban
8891	100106769	Markazi	Urban
349919	115106162	Lorestan	Urban
541529	124207162	Ardabil	Rural
603515	127210962	Golestan	Rural

Adding Classification¶

The add_classification function can decode classification codes like industry and occupation.

It takes columns containing codes like ISIC and ISCO, and decodes them into descriptive categories across multiple levels of hierarchy.

For example with ISIC industry codes:

In [6]:

Copied!

df = lfsir.add_classification(df, target="Main_Job_Workplace_ISIC_Code")
df = lfsir.add_classification(df, target="Main_Job_Workplace_ISIC_Code")

This will add a new column with name "Industry" that contains human-readable titles derived from the ISIC classification system.

Let's confirm the new column were added:

In [7]:

Copied!

df[["Main_Job_Workplace_ISIC_Code", "Industry"]].dropna().sample(20)
df[["Main_Job_Workplace_ISIC_Code", "Industry"]].dropna().sample(20)

Out[7]:

	Main_Job_Workplace_ISIC_Code	Industry
134132	47213	Wholesale and retail trade; repair of motor ve...
592696	47710	Wholesale and retail trade; repair of motor ve...
133606	25920	Manufacturing
100504	13931	Manufacturing
29505	85102	Education
650579	1110	Agriculture, forestry and fishing
602545	1110	Agriculture, forestry and fishing
633658	41000	Construction
230878	1440	Agriculture, forestry and fishing
478026	49230	Transportation and storage
517256	84110	Public administration and defence; compulsory ...
210391	47531	Wholesale and retail trade; repair of motor ve...
584111	49210	Transportation and storage
236889	47410	Wholesale and retail trade; repair of motor ve...
317613	1630	Agriculture, forestry and fishing
475357	41000	Construction
97733	1610	Agriculture, forestry and fishing
320670	1500	Agriculture, forestry and fishing
490271	49230	Transportation and storage
422009	49230	Transportation and storage

So with a single line we have effectively joined a mapping table to decode the original numeric codes into descriptive categories.

We now have a enriched dataset ready for further analysis and visualization!