Getting Started¶
The LFSIR package provides easy access to Iran's Labor Force Survey data. We can load the survey tables and enrich them with additional informations.
import lfsir
Loading Survey Data¶
The load_table
function loads survey data for a given year.
Let's load the data from 1401.
df = lfsir.load_table(years=1401)
df.head()
ID | Census_Turn | Alternative_Household | Member_Number | Relationship | Sex | Birth_Month | Birth_Year | Age | Nationality | ... | Ready_To_Start_Job_If_Found_Last_Week_2 | Reason_Not_Ready_For_Work | Situation_Last_Week | Preferred_Hours_Per_Day | Preferred_Days_Per_Week | Preferred_Employment_Status | Preferred_Work_Sector | Weight | Activity_State | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 110105161 | 3 | False | 2 | Spouse | Female | <NA> | 49 | 52 | Iranian | ... | <NA> | NaN | Student | <NA> | <NA> | NaN | NaN | 274.358093 | Inactive | 1401 |
1 | 110105161 | 3 | False | 3 | Child | Female | <NA> | 77 | 24 | Iranian | ... | <NA> | NaN | Employed | <NA> | <NA> | NaN | NaN | 274.358093 | Inactive | 1401 |
2 | 110105162 | 2 | False | 1 | Head | Male | 3 | 50 | 50 | Iranian | ... | <NA> | NaN | NaN | <NA> | <NA> | NaN | NaN | 288.560120 | Employed | 1401 |
3 | 110105162 | 2 | False | 2 | Spouse | Female | 6 | 56 | 44 | Iranian | ... | <NA> | NaN | Student | <NA> | <NA> | NaN | NaN | 288.560120 | Inactive | 1401 |
4 | 110105162 | 2 | False | 3 | Child | Male | 2 | 77 | 24 | Iranian | ... | <NA> | NaN | Employed | <NA> | <NA> | NaN | NaN | 288.560120 | Inactive | 1401 |
5 rows × 99 columns
We now have the raw survey data for 1401.
Next we can enrich this by adding attributes about each household.
Adding Attributes¶
The add_attribute
function can append additional attribute columns based on a household ID.
This allows easy segmentation and analysis by attributes like province and urban/rural status.
df = lfsir.add_attribute(df, "Province")
We added the province name for each household.
df = lfsir.add_attribute(df, "Urban_Rural")
Now we also have the urban/rural status.
Let's confirm that worked by peeking at the data.
df[["ID", "Province", "Urban_Rural"]].sample(20)
ID | Province | Urban_Rural | |
---|---|---|---|
306048 | 113102066 | Hamadan | Urban |
369154 | 116103873 | Ilam | Urban |
527914 | 123119058 | Tehran | Urban |
663295 | 130108750 | Alborz | Urban |
629745 | 129100266 | South_Khorasan | Urban |
172061 | 107110472 | Fars | Urban |
30129 | 101106368 | Gilan | Urban |
5486 | 100104358 | Markazi | Urban |
654920 | 130100659 | Alborz | Urban |
485520 | 122102263 | Hormozgan | Urban |
629699 | 129100260 | South_Khorasan | Urban |
74420 | 103111254 | East_Azerbaijan | Urban |
34036 | 101209758 | Gilan | Rural |
445900 | 120102167 | Semnan | Urban |
185939 | 108100661 | Kerman | Urban |
372721 | 116106558 | Ilam | Urban |
8891 | 100106769 | Markazi | Urban |
349919 | 115106162 | Lorestan | Urban |
541529 | 124207162 | Ardabil | Rural |
603515 | 127210962 | Golestan | Rural |
Adding Classification¶
The add_classification
function can decode classification codes like industry and occupation.
It takes columns containing codes like ISIC and ISCO, and decodes them into descriptive categories across multiple levels of hierarchy.
For example with ISIC industry codes:
df = lfsir.add_classification(df, target="Main_Job_Workplace_ISIC_Code")
This will add a new column with name "Industry" that contains human-readable titles derived from the ISIC classification system.
Let's confirm the new column were added:
df[["Main_Job_Workplace_ISIC_Code", "Industry"]].dropna().sample(20)
Main_Job_Workplace_ISIC_Code | Industry | |
---|---|---|
134132 | 47213 | Wholesale and retail trade; repair of motor ve... |
592696 | 47710 | Wholesale and retail trade; repair of motor ve... |
133606 | 25920 | Manufacturing |
100504 | 13931 | Manufacturing |
29505 | 85102 | Education |
650579 | 1110 | Agriculture, forestry and fishing |
602545 | 1110 | Agriculture, forestry and fishing |
633658 | 41000 | Construction |
230878 | 1440 | Agriculture, forestry and fishing |
478026 | 49230 | Transportation and storage |
517256 | 84110 | Public administration and defence; compulsory ... |
210391 | 47531 | Wholesale and retail trade; repair of motor ve... |
584111 | 49210 | Transportation and storage |
236889 | 47410 | Wholesale and retail trade; repair of motor ve... |
317613 | 1630 | Agriculture, forestry and fishing |
475357 | 41000 | Construction |
97733 | 1610 | Agriculture, forestry and fishing |
320670 | 1500 | Agriculture, forestry and fishing |
490271 | 49230 | Transportation and storage |
422009 | 49230 | Transportation and storage |
So with a single line we have effectively joined a mapping table to decode the original numeric codes into descriptive categories.
We now have a enriched dataset ready for further analysis and visualization!