Data Validation

Data Processing Methodology:

To contribute to the broader field of study, the IBSO is committed to openness and academic rigour. Our comprehensive data processing methodology, integral to the academic research underpinning this analysis is detailed in an academic journal paper that is under review currently. Upon completion of the review process, the methodology, along with all associated scripts, will be made available on this platform.
In the interim, beta scripts are available for download on Github:

Broad processing steps are outline below;

Data Processing Introduction:

The BER database includes 213 variables describing over 1 million dwellings, representing over 50% of Irish dwellings. 45 Assessor and hence manually inputted variables were analysed for erroneous and outlier data. The data processing methodology involves a rigorous and systematic approach to ensure the accuracy and reliability of our dataset. This process is designed to be replicable and transparent, adhering to best practices in data processing and analysis. The data processing workflow is outlined below in steps 1 to 5.

Step 1: Loading Data

We utilise a range of Python libraries to import and manage our data efficiently. The data is initially loaded in its raw (.txt), unfiltered form for preliminary analysis.

Step 2: Data Consistency Checks

At this step, we add unique identifiers to each data entry and perform consistency checks. This process involves ensuring the accuracy of various informational data points, such as location and assessment dates, e.g. removing entries where the assessment date is in the future.

Step 3: Outlier Identification

Using advanced statistical methods, we identify and categorise outliers in the dataset. This step is crucial for maintaining the integrity of the data analysis.

Step 4: Data Segmentation

In cases of bimodal data distributions, we segment the data for more precise analysis. This step involves dividing data based on relevant categories to enhance the accuracy of our findings.

Step 5: Application of Filters

After identifying outliers and erroneous data, we apply specific filters to refine the dataset. This process ensures that only the most relevant and accurate data is used in our analysis.


More Information

×

Where you communicate with us via this website, the nature of the internet is such that we cannot guarantee or warrant the security of any information you transmit to us via the internet. No data transmission over the internet can be guaranteed to be 100% secure. However, we will take all reasonable steps (including appropriate technical and organisational measures) to protect your personal data. We keep our computer systems, files and buildings secure by following legal requirements and international security guidance.
We make sure that our staff, and anyone with access to personal data that we are responsible for, are fully trained on how to protect personal data. We ensure that our processes clearly identify the requirements for managing personal data and that they are up to date.
We regularly audit our systems and processes to ensure that we remain compliant with these policies and legal obligations.