Data Wrangling: Karachi Property Prices

Method Chaining
Data Wrangling
pandas
Karachi Property Prices
Author

Ricky Macharm

Published

January 4, 2024

Introduction

This dataset encompasses a collection of 8,414 housing advertisements sourced from Zameen.com, specifically pertaining to properties located in Karachi. The dataset is available on Kaggle, a popular platform for data science and machine learning enthusiasts. It can be accessed through the following link: Karachi, Pakistan Property Prices 2023 .

It’s important to note that this dataset represents only a fraction of the comprehensive data available. It was meticulously scraped and compiled by Faiq Ali, who, at the time, was a student at the University of Malaya. This dataset, listed on his Kaggle page, provides valuable insights into the real estate market of Karachi as of the year 2023. We will try to wrangle the data and prepare it for machine learning.

import pandas as pd
file_path = "karachi-pakistan-property-prices-2023.csv" 
def prep_karachi_data(file_path):
    return (pd.read_csv(file_path)
           )

df = prep_karachi_data(file_path=file_path)
print(df.info())
df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8414 entries, 0 to 8413
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   title              8414 non-null   object
 1   price              8413 non-null   object
 2   date added         8047 non-null   object
 3   type               8413 non-null   object
 4   bedrooms           8414 non-null   int64 
 5   bathrooms          8414 non-null   int64 
 6   area               7322 non-null   object
 7   location           8047 non-null   object
 8   complete location  8413 non-null   object
 9   description        8413 non-null   object
 10  keywords           7551 non-null   object
 11  url                8414 non-null   object
dtypes: int64(2), object(10)
memory usage: 788.9+ KB
None
title price date added type bedrooms bathrooms area location complete location description keywords url
0 600 Yard Bungalow For Sale In DHA Phase 6 11.5 Crore 14 hours ago House 5 6 600 Sq. Yd. DHA Defence, Karachi, Sindh DHA Phase 6, DHA Defence, Karachi, Sindh Chance Deal 600 Yard Bungalow For Sale Built in year: 1,Parking Spaces: 5,Flooring,Ot... https://www.zameen.com/Property/d_h_a_dha_phas...
1 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... 1.45 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Ali Block, Bahria Town - Precinc... 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_pr...
2 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... 2.12 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Sports City, Bahria Town Karachi, Karac... 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... Bedrooms: 4,Bathrooms: 4,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka...
3 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... 1.5 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Precinct 31, Bahria Town Karachi... 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka...
4 Buying A Flat In Clifton - Block 9? 4 Crore 6 hours ago Flat 3 3 200 Sq. Yd. Clifton, Karachi, Sindh Clifton - Block 9, Clifton, Karachi, Sindh Apartment for sale Flooring,Electricity Backup,Broadband Internet... https://www.zameen.com/Property/clifton_clifto...

One of the first things we are going to do is to change the name of some of the columns. I want to get rid of spaces.

def prep_karachi_data(file_path):
    return (pd.read_csv(file_path)
            .rename(columns=lambda x: x.replace(" ", "_")) #no spaces in column names
           )

df = prep_karachi_data(file_path=file_path)
print(df.info())
df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8414 entries, 0 to 8413
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   title              8414 non-null   object
 1   price              8413 non-null   object
 2   date_added         8047 non-null   object
 3   type               8413 non-null   object
 4   bedrooms           8414 non-null   int64 
 5   bathrooms          8414 non-null   int64 
 6   area               7322 non-null   object
 7   location           8047 non-null   object
 8   complete_location  8413 non-null   object
 9   description        8413 non-null   object
 10  keywords           7551 non-null   object
 11  url                8414 non-null   object
dtypes: int64(2), object(10)
memory usage: 788.9+ KB
None
title price date_added type bedrooms bathrooms area location complete_location description keywords url
0 600 Yard Bungalow For Sale In DHA Phase 6 11.5 Crore 14 hours ago House 5 6 600 Sq. Yd. DHA Defence, Karachi, Sindh DHA Phase 6, DHA Defence, Karachi, Sindh Chance Deal 600 Yard Bungalow For Sale Built in year: 1,Parking Spaces: 5,Flooring,Ot... https://www.zameen.com/Property/d_h_a_dha_phas...
1 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... 1.45 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Ali Block, Bahria Town - Precinc... 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_pr...
2 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... 2.12 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Sports City, Bahria Town Karachi, Karac... 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... Bedrooms: 4,Bathrooms: 4,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka...
3 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... 1.5 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Precinct 31, Bahria Town Karachi... 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka...
4 Buying A Flat In Clifton - Block 9? 4 Crore 6 hours ago Flat 3 3 200 Sq. Yd. Clifton, Karachi, Sindh Clifton - Block 9, Clifton, Karachi, Sindh Apartment for sale Flooring,Electricity Backup,Broadband Internet... https://www.zameen.com/Property/clifton_clifto...

We will bring out the second column price for further investigation. We will want it to be a number but we can see that it is an object.

df['price']
0       11.5 Crore
1       1.45 Crore
2       2.12 Crore
3        1.5 Crore
4          4 Crore
           ...    
8409       5 Crore
8410     1.2 Crore
8411    1.55 Crore
8412       70 Lakh
8413       1 Crore
Name: price, Length: 8414, dtype: object

This seems to have a combination of units.

“Lakh” and “Crore” are units of numerical value commonly used in the Indian subcontinent, including countries like India, Pakistan, Bangladesh, and Nepal. They are part of the South Asian numbering system and are widely used in these regions for financial transactions, population counts, and more.

  1. Lakh:
    • One lakh is equal to 100,000 (10^5).
    • For example, in international notation, 5 lakh would be written as 500,000.
  2. Crore:
    • One crore is equal to 10 million, or 100 lakh (10^7).
    • In international notation, 1 crore would be expressed as 10,000,000.

These terms provide a more convenient way to express large numbers, particularly in the context of financial transactions and population statistics in the Indian subcontinent. For instance, it’s more common to hear about a budget of 5 crore rupees rather than 50 million rupees.

We will split that column into two, one for the figures and the other for the units. The values and the units are separated by a space.

df['price'].str.split(" ", expand=True)
0 1
0 11.5 Crore
1 1.45 Crore
2 2.12 Crore
3 1.5 Crore
4 4 Crore
... ... ...
8409 5 Crore
8410 1.2 Crore
8411 1.55 Crore
8412 70 Lakh
8413 1 Crore

8414 rows × 2 columns

We will now include that in our function that we are slowly building. We will also convert the values in our new column to a float.

def prep_karachi_data(file_path):
    return (pd.read_csv(file_path)
            .rename(columns=lambda x: x.replace(" ", "_")) #no spaces in column names
            .assign(price_ = lambda x: x["price"].str.split(" ", expand=True)[0],
                  currency_name = lambda x: x["price"].str.split(" ", expand=True)[1])
            .astype({"price_":float})
           )

df = prep_karachi_data(file_path=file_path)
print(df.info())
df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8414 entries, 0 to 8413
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   title              8414 non-null   object 
 1   price              8413 non-null   object 
 2   date_added         8047 non-null   object 
 3   type               8413 non-null   object 
 4   bedrooms           8414 non-null   int64  
 5   bathrooms          8414 non-null   int64  
 6   area               7322 non-null   object 
 7   location           8047 non-null   object 
 8   complete_location  8413 non-null   object 
 9   description        8413 non-null   object 
 10  keywords           7551 non-null   object 
 11  url                8414 non-null   object 
 12  price_             8413 non-null   float64
 13  currency_name      8413 non-null   object 
dtypes: float64(1), int64(2), object(11)
memory usage: 920.4+ KB
None
title price date_added type bedrooms bathrooms area location complete_location description keywords url price_ currency_name
0 600 Yard Bungalow For Sale In DHA Phase 6 11.5 Crore 14 hours ago House 5 6 600 Sq. Yd. DHA Defence, Karachi, Sindh DHA Phase 6, DHA Defence, Karachi, Sindh Chance Deal 600 Yard Bungalow For Sale Built in year: 1,Parking Spaces: 5,Flooring,Ot... https://www.zameen.com/Property/d_h_a_dha_phas... 11.50 Crore
1 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... 1.45 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Ali Block, Bahria Town - Precinc... 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_pr... 1.45 Crore
2 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... 2.12 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Sports City, Bahria Town Karachi, Karac... 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... Bedrooms: 4,Bathrooms: 4,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka... 2.12 Crore
3 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... 1.5 Crore 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Precinct 31, Bahria Town Karachi... 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka... 1.50 Crore
4 Buying A Flat In Clifton - Block 9? 4 Crore 6 hours ago Flat 3 3 200 Sq. Yd. Clifton, Karachi, Sindh Clifton - Block 9, Clifton, Karachi, Sindh Apartment for sale Flooring,Electricity Backup,Broadband Internet... https://www.zameen.com/Property/clifton_clifto... 4.00 Crore

What we are going to do next is to multiply all the Crores by 100 to convert them to Lakhs. Then we drop off the columns we just created. Out code will check the currency_name column to see if the value is Crore before making the conversion. Next it will replace the column price with the result. We will use the mask method for that.

The mask method is used to replace values in a DataFrame or Series under certain conditions.

.assign(price = lambda x: x["price_"].mask(x["currency_name"] == "Crore", x["price_"] * 100))
  1. Using mask inside .assign:
    • The code is creating or modifying the ‘price’ column in the DataFrame.
    • x["price_"].mask(...): This applies the mask method on the price_ column of the DataFrame.
  2. Condition in mask:
    • The first argument of mask is a condition: x["currency_name"] == "Crore". This checks each row in the currency_name column to see if it equals “Crore”.
  3. Replacement in mask:
    • The second argument of mask is x["price_"] * 100. This is what the mask method will replace the original value with, but only where the condition is met (i.e., where ‘currency_name’ is “Crore”).
  4. How It Works in our Example:
    • For each row in the DataFrame, the code checks if the currency_name for that row is “Crore”.
    • If it is “Crore”, the corresponding value in the ‘price_’ column is multiplied by 100 and this new value replaces the original value in the price column.
    • If it is not “Crore”, the value in the price column remains as it is in the ‘price_’ column.

We will now insert that code in the function we are building and also delete the new columns we created since we no longer need them.

def prep_karachi_data(file_path):
    return (pd.read_csv(file_path)
            .rename(columns=lambda x: x.replace(" ", "_")) #no spaces in column names
            .assign(price_ = lambda x: x["price"].str.split(" ", expand=True)[0],
                  currency_name = lambda x: x["price"].str.split(" ", expand=True)[1])
            .astype({"price_":float})
            .assign(price = lambda x: x["price_"].mask(x["currency_name"] == "Crore", x["price_"] * 100))
            .drop(columns=['price_', 'currency_name'])
           )

df = prep_karachi_data(file_path=file_path)
print(df.info())
df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8414 entries, 0 to 8413
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   title              8414 non-null   object 
 1   price              8413 non-null   float64
 2   date_added         8047 non-null   object 
 3   type               8413 non-null   object 
 4   bedrooms           8414 non-null   int64  
 5   bathrooms          8414 non-null   int64  
 6   area               7322 non-null   object 
 7   location           8047 non-null   object 
 8   complete_location  8413 non-null   object 
 9   description        8413 non-null   object 
 10  keywords           7551 non-null   object 
 11  url                8414 non-null   object 
dtypes: float64(1), int64(2), object(9)
memory usage: 788.9+ KB
None
title price date_added type bedrooms bathrooms area location complete_location description keywords url
0 600 Yard Bungalow For Sale In DHA Phase 6 1150.0 14 hours ago House 5 6 600 Sq. Yd. DHA Defence, Karachi, Sindh DHA Phase 6, DHA Defence, Karachi, Sindh Chance Deal 600 Yard Bungalow For Sale Built in year: 1,Parking Spaces: 5,Flooring,Ot... https://www.zameen.com/Property/d_h_a_dha_phas...
1 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... 145.0 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Ali Block, Bahria Town - Precinc... 3 BEDS LUXURY 125 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_pr...
2 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... 212.0 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Sports City, Bahria Town Karachi, Karac... 4 BEDS LUXURY SPORTS CITY VILLA FOR RENT BAHRI... Bedrooms: 4,Bathrooms: 4,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka...
3 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... 150.0 5 hours ago House 0 0 NaN Bahria Town Karachi, Karachi, Sindh Bahria Town - Precinct 31, Bahria Town Karachi... 3 BEDS LUXURY 235 SQ YARDS VILLA FOR SALE LOCA... Bedrooms: 3,Bathrooms: 3,Kitchens: 2 https://www.zameen.com/Property/bahria_town_ka...
4 Buying A Flat In Clifton - Block 9? 400.0 6 hours ago Flat 3 3 200 Sq. Yd. Clifton, Karachi, Sindh Clifton - Block 9, Clifton, Karachi, Sindh Apartment for sale Flooring,Electricity Backup,Broadband Internet... https://www.zameen.com/Property/clifton_clifto...

We may have other columns to drop later on and we would add them to the .drop method. But this is all we are going to do with this right now. We have used the method of chaining to clean our data. We are yet to drop null values from the price column.

This will be all for today. Thanks for reading.