0% found this document useful (0 votes)

22 views19 pages

Real-Time Daylight Analysis with ML

Q: What is the significance of parameter 'd' when determining the performance of the ANN model in predicting Useful Daylight Illuminance (UDI) metrics, and how does it compare to other parameters?

Parameter 'd', representing the number of rays, plays a crucial role in determining the ANN model's performance. Choosing an optimal number of rays can significantly reduce the preparation and training time and improve prediction accuracy. For predicting UDI-s, d=40 was optimal, resulting in RMSE, MAE, R2 values of 7.977, 4.023, and 0.898, respectively. Compared to parameters x and w, 'd' was the least impactful on accuracy, evidenced by RMSE, MAE, R2 values of 9.822, 5.400, and 0.845 when used alone .

Q: What experimental findings support the claim that parameter 'w' significantly influences the ANN model's prediction accuracy?

The experimental findings supporting the claim that parameter 'w' significantly influences prediction accuracy include its high impact on RMSE, MAE, and R2 values. When 'w' was used alone, the model's performance metrics (RMSE: 9.120, MAE: 4.749, R2: 0.866) were better than when using 'x' (RMSE: 11.494, MAE: 6.122, R2: 0.787) or 'd' alone, indicating 'w' was the most pivotal in prediction accuracy .

Q: Discuss the potential limitations and future directions for the ANN model development as described in the document.

Potential limitations include the model's confinement to medium-sized buildings, single orientations, and use of default materials, which may not represent diverse real-world scenarios. Future directions involve expanding to accommodate various building sizes, orientations, and materials, and improving accuracy through additional sensor data. Considering more variables like multi-story designs and different environmental conditions can enhance the model's robustness .

Q: Explain how increasing the number of rays beyond a certain point can negatively affect the accuracy of the ANN model's daylight predictions.

Increasing the number of rays beyond an optimal point can introduce complexities that degrade the model's accuracy by adding unnecessary computational overhead without meaningful improvement. The document notes that with d=40, the model accurately predicts UDI metrics, but exceeding this number did not yield better results and sometimes resulted in less accurate predictions, emphasizing the importance of parameter optimization .

Q: What role do 3D visualizations in Rhino and Grasshopper play in verifying the ANN model's daylight predictions?

3D visualizations in Rhino and Grasshopper play a crucial role in verifying the ANN model's daylight predictions by providing a direct comparison between predicted and actual daylight distribution. These visualizations help in assessing the accuracy of the model's predictions, as they showcase how well the predicted daylight performance aligns with simulated outcomes in a visually intuitive manner .

Q: What are the implications of using the ANN model developed in this study for real-time daylight performance predictions?

The positive results of the ANN model imply its potential as a tool for real-time daylight validation. With high accuracy in predicting UDI metrics, it can replace repetitive simulations in optimization problems. Its adaptability to different metrics and tools suggests broad applicability, including in LEEDv4 certification processes by predicting metrics like ASE or sDA .

Q: Why does the ANN model perform better with UDI-e cases compared to other UDI metrics?

The ANN model performs better with UDI-e cases due to the predictable distribution patterns of daylight near windows, characterized by sensor points having high UDI values, making it easier for the model to predict accurately. In contrast, UDI-a, UDI-f, and UDI-s have a wider range of values and more complex patterns across sensor points, challenging the model's predictive capabilities .

Q: Analyze how the study's approach to improving machine learning models for daylight prediction can contribute to sustainable building design.

The study's approach contributes to sustainable building design by providing efficient daylight predictions, enabling architects to optimize natural light use, thereby reducing reliance on artificial lighting and improving energy efficiency. By streamlining the design process through accurate, rapid predictions, this approach supports informed decision-making, enhancing sustainability in building design with real-time feedback on daylight performance .

Q: How does the ANN model validate its performance in predicting daylight performance across different cases, and what visual evidence supports this validation?

The ANN model validates its performance through accurate predictions of unseen datasets, with R2 values over 0.890 for all UDI metrics. Visual validation comes from comparisons between predicted and simulation values displayed in 3D representations in Rhino and Grasshopper. Predicted UDI values accurately reflected daylight performance based on UDI ranges, agreeing with simulation results. Specifically, for UDI-e cases, sensor points near windows matched high predicted values, supporting the model's reliability .

Q: How does the chosen architecture of the ANN model, with 120 hidden nodes and three hidden layers, impact its prediction accuracy for different UDI metrics?

The architecture with 120 hidden nodes and three hidden layers is optimal for achieving high prediction accuracy across UDI metrics. This configuration balances complexity and computational efficiency, enabling the model to capture intricate patterns in daylight data effectively. As a result, it achieves the highest R2 values, especially for UDI-e with 0.995 and UDI-a with 0.975, demonstrating robust performance .

This document discusses using machine learning to enable real-time daylight analysis in building design. Specifically, it proposes a novel method of creating design variables that can characterize any building layout, and using these to train a machine learning model. The model is trained to efficiently predict useful daylight illuminance metrics for different building designs in Ho Chi Minh City, Vietnam, using data simulated by the DIVA tool. Results showed the machine learning model achieved excellent performance, demonstrating the potential for a data-driven platform to enable real-time daylight validation during building design.

Uploaded by

Victor Okhoya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views19 pages

Real-Time Daylight Analysis with ML

Uploaded by

Victor Okhoya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Journal of Building Engineering 52 (2022) 104374

Contents lists available at ScienceDirect

Journal of Building Engineering

journal homepage: [Link]/locate/jobe

Machine learning-based real-time daylight analysis in buildings

Luan Le-Thanh a, Ha Nguyen-Thi-Viet a, Jaehong Lee c, H. Nguyen-Xuan b, c, *
a
Faculty of Architecture, Van Lang University (VLU), Ho Chi Minh City, Viet Nam
b
CIRTech Institute, HUTECH University, Ho Chi Minh City, Viet Nam
c
Department of Architectural Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul, 05006, South Korea

A R T I C L E I N F O A B S T R A C T

Keywords: Daylight analysis is essential in building design to ensure indoor environment quality, including
Machine learning (ML) health and thermal comfort vis-à-vis energy. It is a repeating and time-consuming process of
Artificial neural network (ANN) design options. Several studies conducted machine learning models to accurately predict daylight
DIVA performance in particular design situations. Therefore, developing an AI-based real-time daylight
Daylight performance analysis platform becomes more promising. However, buildings can be designed with arbitrary
Useful daylight illuminance (UDI)
shapes, creating a real challenge for the AI to recognize any building layout. From that
Building energy
perspective, the idea of finding the design variables that characterize all the building layouts
AI-based real-time daylight validation
becomes the key solution. To unlock this challenge, we promote a novel method of creating
design variables and building a machine learning model that can efficiently forecast daylight
performance with different building layouts. The daylight metric was Useful Daylight Illuminance
with four ranges, and the case studies were assumed medium-sized buildings located in Ho Chi
Minh City, Vietnam. All the data for training and predicting were created by the simulation DIVA
tool. Obtained results showed the excellent performance of the proposed approach, which brings
more promising in developing a data-driven machine learning platform for real-time daylight
validation. Moreover, the present framework can adapt to any specific machine learning model or
daylight simulation tool and daylight metrics.

1. Introduction
People spend about 86.9% of their lifetime inside buildings for working or living [1]. Natural daylight is therefore needed inside
these premises for a balanced habitat. A building with sufficient natural daylight could reduce energy consumption dramatically for
artificial light systems. On the other hand, getting too much unnecessary daylight could increase cooling energy to remove the heat or
even bring much glare, which makes the visual discomfort. Therefore, controlling daylight at the very beginning of the design pro
cedure is a necessary task [2]. Nowadays, daylight simulation plays an essential part in achieving a green building certificate such as
Leadership in Energy and Environmental Design (LEED) or Building Research Establishment Environmental Assessment Method
(BREEAM) [3]. Over the years, researchers have developed methods and simulation tools that were classified into three main groups:
static, dynamic, and climate-based daylight modeling [4]. The static method used Daylight Factor (DF) to evaluate the daylight
performance. DF is calculated by the ratio of the inside illuminance (Ei) at a given working plan and the outdoor illuminance (Eo) from
the overcast sky [4]. Because this metric does not consider the climate conditions or even the façade directions, the simulation is fast,
but its results have poor reliability [4–6]. The dynamic method relied on the daylight coefficient (DC) introduced by Tregenza and

* Corresponding author. CIRTech Institute, HUTECH University, Ho Chi Minh City, Viet Nam
E-mail address: [Link]@[Link] (H. Nguyen-Xuan).

[Link]
Received 17 November 2021; Received in revised form 9 March 2022; Accepted 14 March 2022
Available online 18 March 2022
2352-7102/© 2022 Elsevier Ltd. All rights reserved.
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 1. Gradient maps represent the daylight metric values, calculated by DIVA (a). UDIUseful 100-2000 lux (b). DA % time >300 lux.

Waters in 1983 [7]. In the DC method, the celestial hemisphere is divided into many sky segments, and it determines these segments’
contribution to the total illuminance through specific sensor points [7,8]. The studies confirmed that DC produced a better result than
DF with relative errors of up to 25% than site measurement [4,9].
In 2000, a more effective method called climate-based daylight modeling (CBD) was introduced as an improved approach based on
the DC [10,11]. CBD also depends on design contexts such as locations, orientations, the opening ratio on façade, and the variable of
sky luminance distribution from weather data [4]. The CBD method first determined sensor points on the analysis grid, as shown in
Fig. 1. Within a specific period and weather data, the illuminance at every sensor point was calculated to deliver the final daylight
metrics [4,12]. CBD has been known as the most reliable one until now [2,13–15].
Based on the daylight calculation methods, there are two types of daylight metrics: static and dynamic [4]. The static metrics,
which have the oldest representative, DF [4,6], do not consider the weather changes. In contrast, dynamic or climate-based metrics
take into account the full range of daylight conditions. Several standard dynamic metrics used in evaluating daylight performance
inside buildings are Daylight Autonomy (DA), Spatial Daylight Autonomy (sDA), Useful Daylight Illuminance (UDI) [6].
There could be some choices among various simulation tools [4]. However, the simulation process takes much time to achieve a
reasonable result. Typically, the simulation process consumes about 5 min to hours, depending on the size of the building. For example,
if a parametric model of a building generated 500 possible design options, which need to be investigated in terms of daylight per
formance, it took from 42 to hundreds of hours to calculate all the cases. In fact, real-life design problems face a much larger number of
cases. It seems inconsequential because designers need time to create design concepts rather than waiting for this procedure. If a tool
could handle daylight analysis procedures in real-time, designers would concentrate more on developing the idea and improving
design efficiency.
Thanks to computer science aid, developing optimization algorithms and artificial intelligence in the design process helps solve
problems faster. Some studies have used the simulation-based optimization (SBO) method to find faster the optimal solution [2,
16–18]. However, the SBO method still needs the simulation process to find the optimal design. Several researchers use ML to predict
daylight metrics based on past simulation data or site measurement, which achieve significant time-saving for simulation [19–22].
The ML model is the algorithm that can learn from past relevant data and then automatically optimize itself over time to predict or
classify the outputs or even recognize the data patterns [23]. Generally speaking, a dataset used to train ML models is split into the
train-set/the test-set, often with a percentage ratio of 80/20. ML automatically improves itself to minimize the errors between the
predicted and actual outputs from the train-set input. Its performance is then validated by the test-set. When ML is well trained, it can
be used to predict the underlying problem’s new data. Applying ML to daylight performance prediction can quickly deliver the result
from design inputs without any simulation processes. It is evident that developing an AI-based platform to perform daylight analysis in
real-time is totally reasonable. To achieve such a goal, ML used in this platform needs to have the capability to handle any design
situation. It means, whatever designs are put into this tool, it needs to return the acceptable daylight analysis results in real-time with
seconds.
In recent years, several studies addressed ML to daylight prediction with various purposes and approaches. According to Ayoub
(2020), a review of using ML in daylight prediction was presented. Among those studies, half of them used the Artificial Neural
Network (ANN) model, while the remaining involved Multiple Linear Regression (MLR), Support Vector Machine (SVM), Decision Tree
(DT), etc. In Hu and Olbina’s study (2011), ANN was employed to control the blinds’ angle in real-time automatically. The ANN model
learned to predict the illuminance values at two sensor points inside the building with the weather data. Then the trained model
predicted the illuminance value for calculating optimum slat angles. Lorenz (2018) used an ANN model to predict an atrium’s Daylight

2
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

autonomy (DA) and SDA [24]. The whole processing time was 65% less than the simulation process for all the possible design options
and high accuracy. Another study by Ngarambe (2020) compared several ML models when predicting daylight illuminance in a
building.
It is noted that the problems in the studies of Lorenz (2018) or Ngarambe (2020) concerned rectangular floor plans. The AI-
predicted results accurately evaluated the same shape as the floor plan. Therefore, the method of Lorenz or Ngarmabe was not suit
able for constructing the AI-based real-time daylight analysis platform because it needs to handle different design situations. To fill this
gap, this paper focuses on finding the design variables that can characterize all building layouts using a data-driven machine learning
approach. A case study is the building in Ho Chi Minh City, Vietnam. The ANN model is trained and used to predict the UDI based on
the simulation results. The mean-absolute-errors (MAE), the root-mean-square deviation (RMSE), the mean-square-errors (MSE), and
coefficient of determination (R2) are then calculated to validate the accuracy of our work.

2. Problem statement
As the nature of machine learning, it needs related old data to learn and to predict. The accuracy of the machine learning model
depends on what and how it has been learned. Using the machine learning approach for daylight analysis, the input data could be
separated into external and internal parameters [19]. While external parameters are the fixed design contexts, including weather data,
time, site locations, surrounding buildings or obstacles, etc., internal design variables involve building layouts, windows, orientations,
sensors positions, etc. Those internal design variables are the factors that characterize the specific building designs.
As mentioned before, sensors on the analysis grids are the critical element in the CBD method. However, in different building
layouts, the sensor numbers or relative positions with surrounding obstacles are different. Previous studies determined the relationship
between sensors and buildings by their x, y coordinates and distance to windows [25,26]. Unfortunately, this method could perform
well in a specific building, but it was unsuitable for predicting various layouts. As shown in Fig. 2, both Smn sensors in two cases all have
the same x and y coordination values and the distance to the windows on different locations. It is clear that, although with the same
input data, the simulation results of these two cases are, of course, different. This was the reason why the previous approaches based on
the machine learning model could not learn and predict accurately with different building layouts.
To overcome this shortcoming, we introduce a novel method to determine the relationship between sensors and surrounding
obstacles in buildings, which was not based on the coordination of sensors.

3. Methods
3.1. Overall workflow
The main idea behind this research was to create a tool using artificial intelligence to predict daylight performance in real-time. As
shown in Fig. 3a and Fig. 3b, this tool would be built as an online AI platform that allows designers to quickly create concept models
and get the daylight validation in seconds. This tool should provide a simple 3D drawing environment (web-based design canvas) for
users to design a building in low detail and choose the required simulation outcomes. It is suitable for energy analysis purposes. Based
on the given building, all the design parameters will be streamed to an online AI model for predicting the result right after that.
To develop this tool, an ML model that can predict daylight performance with many different building layouts needs to be built up.
This section describes the details of the research workflow, including four steps, as shown in Fig. 4. The whole process combines 3D
modeling software, a daylighting simulation tool, and a machine learning model. This article focuses on presenting the results from
Step 1 to Step 3 because those are the critical parts of the method. If the outcome is positive, Step 4 will come into play with further
research. Fig. 6 in Section 3 illustrates all details from Step 1 to Step 3.

Fig. 2. The old method for determining the position of sensors in building layouts.

3
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 3a. The idea of developing an AI online platform for real-time daylight validation.

Fig. 3b. Workflow of using an AI online platform for real-time daylight validation.

i. Step 1: Dataset preparation

a. 3D modeling, specifying the input data
All the data needed for the ML model are conducted from a daylight simulation process. The investigated building is modeled by
Rhinoceros 3D – a NURBS modeling tool with a specific location and weather input. Because the CBS method bases on sensor points to
calculate and deliver the results, the current study considers the relationship between the design contexts and sensor points. For
modeling the parametric model, the plugin Grasshopper – a graphical algorithm editor inside Rhinoceros is used.
b. Daylight simulation, collecting the output data

Among 50 simulation tools mentioned in the research of Ayoub (2020), any tool that uses the CBS method is suitable [19].
Nevertheless, in this workflow, the preferable tools are the ones integrated with the Grasshopper. It is noted that this framework should
not be limited to any simulation tool. Thanks to the reliability and ease to use, a well-known daylight simulation tool distributed by
Solemma LLC, DIVA, is used with those design variables to calculate the daylight metrics at specific sensor points [27]. DIVA is a
high-performance simulation tool used in many studies [2,6,15,28]. DIVA begins the simulation procedure and produces the results
based on the automatically chosen values in their input domains. The simulations’ inputs and outputs are collected and streamed into
an excel file (.csv) as the ML model’s data in the next step.
ii. Step 2: Build a machine learning model
With the data in the previous step, the ML model is constructed, the dataset is split into the train-set and test-set with the ratio of 80/
20. The ML model learns from the training data and optimizes itself for better accuracy over time.
iii. Step 3: Use the machine learning model to predict unseen data
After the training, the ML model is ready to predict the unseen data. With the new inputs, it calculates daylight metric values of
every sensor. Then all the data are consolidated to represent the 3D or 2D results.

iv. Step 4: Develop a tool for real-time daylight prediction

4
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 4. Framework for developing a real-time daylight performance validation.

Based on the well-trained ML model, a tool for real-time daylight prediction is developed as a plugin inside Grasshopper or an
online platform. This tool could be trained more by the users.

3.2. Daylight metric – useful daylight illuminance

Useful Daylight Illuminance (UDI) was first introduced by Ref. [29] in 2005 as a daylight metric. As aforementioned, the UDI is
calculated by a climate-based method that uses the hourly sun and sky conditions from a weather dataset. It has been used commonly
and proved reliable and effective in evaluating a building’s daylight performance [5,30].
The UDI represents the percentage of time when the illuminance is in specific ranges [29]. The UDI metric contains different ranges
of values up to the level of daylight illuminance: UDIUseful from 100 lux to 2000 lux, and recently updated to 3000 lux. UDIUnderlit lower
than 100 lux, and UDIOverlit larger than 3000 lux. UDIUseful describes the useful daylight illuminance level while UDIUnderlit shows the
lack of daylight illuminance, and UDIOverlit indicates too much daylight [31]. The UDIOverlit can cause visual discomfort like glare or
energy-wasting [29,32].
However, results are too wide when using UDIUseful, UDIUnderlit, UDIOverlit, so the results are often saturated. In 2012, Mardaljevic
introduced four new ranges of UDI, which included UDIfell-short (UDI-f) lower than 100 lux, UDIsupplementary (UDI-s) from 100 lux to 300
lux, UDIautonomous (UDI-a) greater than 300 lux and lower than 3000 lux, UDIexceeded (UDI-e) greater than 3000 lux [33]. The UDI with
four bins was used as the daylighting criteria in this study, as shown in Table 1. Those four ranges of UDI were chosen in this study
because those metrics could fully describe the daylight status of spaces. Designers could easily find which areas are lack daylight which
ones are enough or over.

Table 1
Four ranges of the Useful Daylight Illuminance (UDI).

Daylight criteria Unit Illuminance received at sensors

UDIfell-short (UDI-f) % <100 lux

UDIsupplementary (UDI-s) % ≥100 lux and ≤300 lux
UDIautonomous (UDI-a) % >300 lux and <3000 lux
UDIexceeded (UDI-e) % ≥3000 lux

5
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 5. Diagram of an ANN model.

3.3. Machine learning model - Artificial Neural Network

The Artificial Neural Network (ANN), inspired by the human nervous system, is one of the most recent ML models. It has been used
to solve many complex problems in a variety of fields [34–37]. As a kind of black-box method, the model could predict the output with
the input data [19,38]. An ANN model has three parts: input layers to receive the input, hidden computational layers, and output layer
predicting the result [19], as shown in Fig. 5. The ANN model automatically optimizes itself by adjusting the weights inside the hidden
layers to minimize the difference between the actual values and the predicted values. In this study, the ANN model is used to predict
the outputs. Fig. 4 describes a typical ANN model with n hidden layers. In this model, information is transmitted from the input layer
through n hidden layers to the output layer. Within a specific layer, a node’s value is the sum of the output data of nodes in the previous
layer, which are multiplied with corresponding weights. Then this value is checked by an activation function such as Linear, Sigmols,
or ReLU, etc., for deciding whether or not the next node needs to be updated. This process is described mathematically as follows [39]:
(n )
( ) ∑ p− 1

ypj = f xPj = f wp−ji 1 xip− 1 + θpj (1)

i=1

where ypj and xPj are the input and output of activation function f of the jth node in the pth layer. f(xPj ) is the activation function of the jth
node in the pth layer, wp−
ji
1
is the weight that adjusts the value of the ith node in the (p-1)th layer, while θpj is corresponded to the bias of
th th
the j node in the p layer. The activation function used in this study was Rectified Linear Units (ReLU), which described by the
following Equation:
ReLU(x) = max(x, 0) (2)

4. Case study
As mentioned in the previous section, this paper introduces the results from Step 1 to Step 3, which are the critical parts of the
underlying method. So currently, the study only covers the probability of using ML to predict the UDI with many different building
layouts. In order to train the ANN model, parametric models of simple buildings were developed to create 400 different ones. This
section describes the detailed research content from Step 1 to Step 3, as shown in Fig. 6.

6
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 6. Details of the research workflow from Step 1 to Step 3.

4.1. The first step: dataset preparation

4.1.1. 3D modeling, specifying the input data
Fig. 7 illustrates how different building layouts were developed based on four 16 m2 regular square plans. Four surrounding squares
can shift clockwise from the four corners A, B, C, D to the middle point of every edge of the center square. Therefore, positions of four
squares were determined by four variables, which were from 0 mm to 2000 mm, as described in Table 2. The final shape that unioned
from the new four squares was the layout of the building. As shown in Fig. 8, several typical building floor plans in 400 cases for ANN
training were presented. With every building layout, windows were randomly distributed with a variety of lengths and numbers. For
reducing the number of cases, the height of windows was constrained with the sill height of 1250 mm and 1200 mm height, while the
ceiling height was 2700 mm.

7
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 7. The method for creating parametric models.

Table 2
Variables for creating building layout.

Parameter Unit Value

A mm [0..2000]
B mm [0..2000]
C mm [0..2000]
D mm [0..2000]

Fig. 8. Several building layouts.

The assumed building faced South and was located in Ho Chi Minh City, Vietnam, with specific weather data conditions. In
addition, Ho Chi Minh city’s weather is characterized by a dry season from December to April and a rainy season from May to
November. The average temperature is about 27.5 ◦ C, and the highest is 29.3–35 ◦ C in April [40].
The work plane was set at 800 mm in height from the floor. A 600 x 600 mm grid with light sensors in the center was set up on the
work plane for daylight analysis. Every sensor Si received direct light from outside through windows and reflected light from inside.
For that reason, it was essential to evaluate the relationship between sensors and windows and the building’s obstacles. In this
research, a novel method to effectively evaluate those relationships can be used as a general solution in many different building
layouts. The proposed method included three groups of design variables.
Firstly, xi – the design input corresponded to the distance from the sensor to an obstacle, as shown in Fig. 9. To determine the
various positions of the walls with a specific sensor, many rays equally divided on a circle with the sensor were the center. When a ray
intersected an obstacle, it returned a length from this obstacle to the sensor, and in contrast, the length was zero when it went through a
window. Therefore, a group of {x1, x2, x3, …, xn | n was the number of rays} described the relationship between sensors and sur
rounding obstacles.

8
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 9. Parameters x.

Fig. 10. Parameter d.

Even rays with length were zero noticed the specific position of a window, but not its size. The second design input – dmi, which was
the distance from a sensor to the one corner of a specific window, as shown in Fig. 10. As a result, a group of {dm1, dm2, dm3, dm4 | m was
the mth window} informed the distance between sensor Si and the mth window.

9
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 11. Parameter w.

Table 3
The inputs of the dataset.

Parameter Unit Value Variable

x n = [40, 60, 80, 100] x1, x2, x3 …, xn

d mm m = [1, 2, 3, 4] dm1, dm2, dm3, dm4
w rad m = [1, 2, 3, 4] wm1, wm2, wm3, wm4

Lastly, windows needed to be specified which directions they belonged compared to the sensor Si. The third design input - wmi,
which was the angle between vector Y started from the sensor Si and the projected line to the floor plane of a dmi line. A group of {wm1,
wm2, wm3, wm4 | m was the mth window} determined the directions of the mth window, as shown in Fig. 11.
Table 3 describes inputs of the dataset, which involve three parameters x, d, w. The parameter x had four values, 40, 60, 80, and
100, which for comparing the outcomes with the different number of rays.
In this study, the maximum number of windows was limited to four. It was reasonable due to the medium size of assumed buildings.
In cases with the number of windows lower than four, the parameters d and w of missing windows were zero. It was because inputs for
training needed to be vectors with equal dimensions.

4.1.2. Daylight simulation, collecting the output data

From the design variables’ data in Table 2, 400 cases were picked randomly in their ranges. They were the dataset for the training
and testing of the ML model. In every case, a set of design inputs involved parameters x, d, w, as shown in Table 3. DIVA simulation got
the inputs from the dataset and calculated all the cases sequentially to produce the daylight illuminance. However, DIVA did not
deliver the four ranges outcomes of UDI-f, UDI-s, UDI-a, UDI-e. Therefore, a python script was developed and interfaced with
Grasshopper to calculate those outputs using hourly data at every sensor. Each case created a different set of design inputs and also the
number of sensors. With different positions of the sensor points, the UDI-f, UDI-s, UDI-a, UDI-e values it received were different. The
simulation values at every sensor point were recorded as the outputs for the dataset. Table 4 describes the materials of the building
parts. The configuration of the simulation process is shown in Fig. 12.
Each simulation delivered outputs of UDI-f, UDI-s, UDI-a, UDI-e for all sensors. However, the ML model needed to be trained and

Table 4
The materials of building parts.

Interior wall Reflectance 80%

Interior ceiling Reflectance 70%
Interior floor Reflectance 20%
Ground Reflectance 20%
Glazing double planes, clear Transmittance 80%

10
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 12. Configuration of DIVA simulation.

only predict values at a single sensor. Therefore, the dataset needed to be split based on the position of every sensor. The final dataset
for training the ML model had 66,299 samples (sensors). Other 50 cases were used as the unseen dataset, including 13,260 samples, for
validating the ML model’s performance.

4.2. The second step: build the machine learning model

4.2.1. Artificial neural network (ANN)
The ANN model used in this research was tested with three different architectures, including 108, 120, and 160 hidden nodes, as
shown in Fig. 13 and Table 5. The model was created on a Colab notebook, using Tensorflow 2.6.0. The dataset was split into two
separate subsets, 80% for train-set and the remaining 20% for test-set. For simple calculation, all variables were standardized. A
dropout layer was added to the ANN to prevent overfitting and an early stopping function [41].

4.2.2. Evaluation metrics for the ANN model

The Coefficient of Determination (R2), Root mean square error (RSME), and mean absolute error (MAE) were used to calculate the
difference between the actual values and the predicted ones [42]. The lower values of MAE, RSME, and higher values of R2 proved the
better performance of the ANN model. Those metrics can be discovered as in Eqs. (3)–(5):

1 ∑N
MAE = y − yi |
|̂ (3)
N i=1 i

√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√
√1 ∑ N
RMSE = √ (̂y − yi )2 (4)
N i=1 i

Table 5
Three different ANN model’s architectures.

Number of hidden nodes 108 120 160

Table 6
Performance of three ANN’s architectures.

UDI UDI-f (%) UDI-e (%) UDI-s (%) UDI-a (%)

Hidden nodes 108 120 160 108 120 160 108 120 160 108 120 160
RMSE 6.738 6.579 6.462 1.576 1.271 1.310 7.939 8.099 8.321 5.816 5.426 5.460
MAE (%) 2.844 2.729 2.690 1.076 0.815 0.791 4.146 4.027 4.173 3.627 3.272 3.061
R2 0.889 0.894 0.898 0.993 0.996 0.996 0.899 0.894 0.889 0.968 0.972 0.972

11
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Table 7
Effect of number of rays on the ANN performance.

dn RMSE MAE R2 dn RMSE MAE R2

UDI-f 40 6.714 2.795 0.890 UDI-e 40 1.386 0.908 0.995

60 6.572 2.792 0.895 60 1.296 0.786 0.995
80 6.579 2.729 0.894 80 1.271 0.815 0.996
100 6.788 2.846 0.888 100 1.480 0.831 0.994

UDI-s 40 7.977 4.023 0.898 UDI-a 40 5.185 3.086 0.975

60 8.094 4.129 0.895 60 5.333 3.109 0.973
80 8.100 4.028 0.894 80 5.426 3.272 0.972
100 8.175 4.113 0.892 100 5.514 3.113 0.971

Table 8
ANN performance in each range of UDI.

Metrics RMSE MAE (%) R2 RMSE MAE (%) R2

UDI-f 6.510 2.690 0.892 6.714 2.795 0.890

UDI-e 1.344 0.892 0.995 1.386 0.908 0.995
UDI-s 7.627 3.801 0.898 7.977 4.023 0.898
UDI-a 4.844 2.962 0.977 5.185 3.086 0.975

∑N
y i − yi )2
(̂
R2 = 1 − ∑i=1
N 2
(5)
i=1 (yi − yi )

where ̂
y i is the predicted value, yi is the actual value, yi is the average value, and N is the number of samples.

4.3. The third step: use a machine learning model to predict unseen data
After the ANN model was well trained, it could be used to predict new cases. Random 50 cases (unseen dataset) were used to
validate the ANN model’s performance. Inputs of one case contained relative positions of many sensor points, which means there were
13,260 samples for the unseen dataset, as mentioned before. The ANN model returned all the predicted UDI-f, UDI-e, UDI-s, UDI-a of

Fig. 13. The ANN model for training and predicting.

12
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 14. Comparison of predicted and simulation values.

13,260 samples in a whole dataset, which were then be split into specific cases. Predicted data were streamed into Grasshopper to
represent 3D results and compare with the simulation ones.

4.4. Results
For choosing the best performance ANN’s architecture, all three structures were tested with parameter d = 80. Table 6 describes the
performance of the three ANN architectures. Considering the best performance and computation cost of those three models, the 120
hidden nodes ANN model was chosen as the most suitable one.

4.4.1. Effect of the number of rays on the ANN model’s performance

Among three groups of design variables, only the number of rays, d, was adjustable. Choosing the correct number of parameter
d could reduce a significant time for preparing data and training. Besides, too many rays could become unnecessary, even making the
prediction less accurate. As shown in Table 7, the UDI-s was the most accurately predicted when d = 40, with RMSE, MAE, R2 was
7.977, 4.023, 0.898, respectively. The same result was found with UDI-a, with RMSE, MAE, R2 was 5.185, 3.086, 0.975, respectively.
On the contrary, the model performed better while predicting UDI-f and UDI-e with d = 60 or d = 80. Therefore, when considering the
suitable number of rays, d = 40 could be a good choice.

4.4.2. Validation of the ANN model’s performance

The ANN model’s performance was validated based on its prediction’s accuracy of the unseen dataset. As shown in Table 8, the
ANN model proved its excellent performance, with all the R2 being over 0.890. The ANN model delivered the most accurate prediction
of UDI-e in the unseen dataset, with RMSE, MAE, R2 being 1.386, 0.908, 0.995, respectively. Followed by the UDI-a with RMSE, MAE,
R2 was 5.185, 3.086, 0.975, sequentially. The accuracy seemed to be slightly decreased during predictions of UDI-f and UDI-s. Fig. 14
illustrates the comparison of predicted and actual values.
The numerical data from the ANN prediction were streamed back into Grasshopper to be represented in 3D. After that, the daylight
performance of every case was compared with the simulation outcomes.

13
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 15. Comparison of predicted and simulation values – 3D represented – Case 1.

Figs. 15–18 show comparisons of four random cases in unseen datasets. The pictures on the left are the simulation results, while the
predicted outcomes are represented on the right. Four UDI ranges were described with four different colors. As lower UDI values were
illustrated with the brighter shade, the higher ones were darker. Overall, the predicted values accurately reflected the daylight per
formance of those buildings based on four UDI ranges.
As shown in Figs. 15–18, with the UDI-e cases, 10–15% of sensor points near windows have the highest value from 50% to 100%.
On the contrary, 85–90% of sensor points far away from windows have the value of 0 and 1. As a result, the train-set data was
characterized with a quite easy-to-guess pattern. It was the reason why the ML model achieved the best accurate prediction with UDI-e
cases. The remaining cases of UDI-a, UDI-f, UDI-s had a wide range of simulation values at every sensor point, so the ML model was
harder to return as good a prediction as UDI-e cases. Luckily, the result cannot be considered a single sensor point result for each case.
The outcome must be a group of many sensor points in a particular floor plan (160 or more depending on the size of the building) to
ensure that it represents adequate daylight performance. Therefore, even though there were differences between the prediction and
simulation results at several sensor points, they did not affect the character of the daylight performance of the building, as shown in
Figs. 15–18. From this, designers could still logically understand the analysis result. This means the results from the ANN model were
reliable, which could be used to give decisions to the following design stages.
As mentioned before, the proposed method could be applied with other daylight metrics or simulation tools. Therefore, this model
could train and predict ASE or sDA values to validate the LEEDv4 certificate. If the accuracy of the ANN model can be further improved,
it could be used instead of repeating the simulation of optimization problems.

4.4.3. The influence of design variables on ANN model’s performance

We consider three design parameters d, x, w. To validate the influence of those parameters on the ANN performance and determine
which one was the most important, those parameters were separately tested with the ANN model. The 120 hidden nodes ANN model
was chosen with d = 40 for predicting the UDI-s. According to Table 9, the ANN model with all three parameters returned the best
performance with RMSE, MAE, R2 was 8.094, 4.129, 0.895, sequentially. It was clear that parameter w had a significant impact on the
accuracy of the ANN model with RMSE, MAE, R2 was 9.120, 4.749, 0.866, respectively. When the ANN model only used parameter x as
the input, the prediction was the worst accuracy, with RMSE, MAE, R2 being 11.494, 6.122, 0.787, respectively.

14
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 16. Comparison of predicted and simulation values – 2D represented – Case 13.

5. Conclusions
This research addressed an efficient framework to create a machine learning model that could predict daylight performance with
many different building layouts. The overwhelming results of this study confirmed the possibility of conducting an AI-based real-time
daylight validation platform.
The climate-based daylight metric used in this study was the Useful Daylight Illuminance with four ranges UDIfell-short, UDIsupple
mentary, UDIautonomous, UDIexceeded. The daylight simulation process generated data for training the ANN model and validating the pre
diction. Design inputs included the relative position of sensors with surrounding obstacles and windows. Data of 400 cases containing
66,299 samples were used to train the ANN model. Another 50 cases with 13,260 samples were the unseen data for prediction. After
the UDI-f, UDI-s, UDI-a, UDI-e values were collected at every sensor in each case, they were streamed into Rhino and Grasshopper to
conduct the 3D results. The final results could be represented in both 3D and 2D models for the best daylight description.
The performance of the present model was proved by the exceptional values of RMSE, MAE, R2. The most accurate results were
found when predicting UDI-e and UDI-a, with R2 being 0.995, 0.975, while UDI-s and UDI-f experienced lower, with R2 being 0.898,
0.890, respectively. Each design parameter was used as the only input for training the ANN model to determine the most influential
parameter on the model’s accuracy. The parameter w that corresponded to positions and sizes of windows impacted the ANN model’s
accuracy the most, while the number of rays was the least. After testing several ANN architectures, the 120 hidden nodes with three
hidden layers were the most suitable.
In conclusion, these are some main advantages of the proposed framework:
• The ANN model performed overwhelmingly with highly accurate predicted results in many different building layouts.
• The positive results proved promising in developing a tool for real-time daylight validation.
• This framework was not tied to any particular ML model or daylight metrics, or simulation tools.
• The ANN model could replace the simulation process in optimization problems.

15
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 17. Comparison of predicted and simulation values – 3D represented – Case 17.

Although this study proved the promising development of the method, some drawbacks need to be mentioned. Firstly, assumed
buildings were generated in medium size so that the ANN model could not work properly with larger ones. Secondly, only one
orientation and the building’s location were considered to reduce the number of samples. And lastly, this study used the default
materials for the building to speed up the simulation process and only focused on finding a suitable method. Although double-clear
glazing may be a non-common case of glazed façade in office buildings, it is acceptable to be used in this study. Besides, when
using this platform for real-time simulation, the users could choose many material options, especially the glass types even the façade,
which significantly affects the simulation results. Therefore, future research needs to consider more different design factors that impact
the results at sensor points. In detail, some aspects can be further investigated, such as: considering the interior wall, different locations
(weather conditions) and orientations, multi-stories buildings, building voids, different materials, or even the simple fenestration.
When the ANN model could predict any building layout, it could be developed as a tool for real-time predicting daylight performance.

Author statement
Luan Le-Thanh: Investigation, Methodology, Validation, Writing - original draft. Ha Nguyen-Thi-Viet: Validation, Resources,
Writing - original draft. Jaehong Lee: Writing - review & editing. H. Nguyen-Xuan: Conceptualization, Methodology, Supervision,
Writing - review & editing.

16
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

Fig. 18. Comparison of predicted and simulation values – 2D represented – Case 26.

Table 9
Influential of parameters on the ANN performance.

Parameters RMSE MAE R2

x, w, d 8.094 4.129 0.895

w 9.120 4.749 0.866
x 11.494 6.122 0.787
d 9.822 5.400 0.845
w and d 9.164 4.513 0.865

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Acknowledgments
The first and the second authors would like to thank Van Lang University for research funding. The first author acknowledges the
support of European Commission H2020-MSCA-RISE BESTOFRAC project for duration of secondment at University of Padua, Italy. The
last author acknowledges the support of the Alexander von Humboldt Foundation for a Digital Cooperation Fellowship and Quy Nhon
University through the master program in Applied Data Science sponsored by Vingroup Innovation Foundation (VINIF) in project code
VINIF.2020.JM01, Viet Nam.

17
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

References
[1] N.E. Klepeis, W.C. Nelson, W.R. Ott, J.P. Robinson, A.M. Tsang, P. Switzer, et al., The National Human Activity Pattern Survey (NHAPS): a resource for assessing
exposure to environmental pollutants, J. Expo. Anal. Environ. Epidemiol. 11 (2001) 231–252, [Link]
[2] L. Le-Thanh, T. Le-Duc, H. Ngo-Minh, Q.-H. Nguyen, H. Nguyen-Xuan, Optimal design of an Origami-inspired kinetic façade by balancing composite motion
optimization for improving daylight performance and energy efficiency, Energy 219 (2021) 119557, [Link]
[3] C. Giarma, K. Tsikaloudaki, D. Aravantinos, Daylighting and visual comfort in buildings’ environmental performance assessment tools: a critical review,
Procedia Environ. Sci. 38 (2017) 522–529, [Link]
[4] M. Ayoub, 100 Years of daylighting: a chronological review of daylight prediction and calculation methods, Sol. Energy 194 (2019) 360–390, [Link]
10.1016/[Link].2019.10.072.
[5] F. Cantin, M.C. Dubois, Daylighting metrics based on illuminance, distribution, glare and directivity, Light. Res. Technol. 43 (2011) 291–307, [Link]
10.1177/1477153510393319.
[6] C.F. Reinhart, Daylighting Handbook II. Building Technology, BT) Press, 2018.
[7] P.R. Tregenza, I.M. Waters, Daylight coefficients, Light. Res. Technol. 15 (1983) 65–71, [Link]
[8] D. Bourgeois, C.F. Reinhart, G. Ward, Standard Daylight Coefficient Model for Dynamic Daylighting Simulations, vol. 36, 2008, [Link]
09613210701446325.
[9] Y. Bian, Y. Ma, Analysis of daylight metrics of side-lit room in Canton, south China: a comparison between daylight autonomy and daylight factor, Energy Build
138 (2017) 347–354, [Link]
[10] C.F. Reinhart, S. Herkel, The simulation of annual daylight illuminance distributions — a state-of-the-art comparison of six RADIANCE-based methods, Energy
Build 32 (2000) 167–187, [Link]
[11] J. Mardaljevic, Simulation of annual daylighting profiles for internal illuminance, Light. Res. Technol. 32 (2000) 111–118, [Link]
096032710003200302.
[12] C.F. Reinhart, J. Mardaljevic, Z. Rogers, Dynamic daylight performance metrics for sustainable building design, LEUKOS - J. Illum. Eng. Soc. North Am. 3 (2006)
7–31, [Link]
[13] S. Samadi, E. Noorzai, L.O. Beltrán, S. Abbasi, A computational approach for achieving optimum daylight inside buildings through automated kinetic shading
systems, Front. Archit. Res. (2019), [Link]
[14] R.A. Mangkuto, M. Rohmah, A.D. Asri, Design optimisation for window size, orientation, and wall reflectance with regard to various daylight metrics and
lighting energy demand: a case study of buildings in the tropics, Appl. Energy 164 (2016) 211–219, [Link]
[15] A. Wagdy, A. Sherif, H. Sabry, R. Arafa, I. Mashaly, Daylighting simulation for the configuration of external sun-breakers on south oriented windows of hospital
patient rooms under a clear desert sky, Sol. Energy 149 (2017) 164–175, [Link]
[16] T. Le-Duc, Q. Nguyen, H. Nguyen-Xuan, Balancing composite motion optimization, Inf. Sci. 520 (2020) 250–270, [Link]
[17] P. Bakmohammadi, E. Noorzai, Optimization of the design of the primary school classrooms in terms of energy and daylight performance considering occupants’
thermal and visual comfort, Energy Rep. 6 (2020) 1590–1607, [Link]
[18] N. Tarek Abdelraouf Esmael, S. Sadek Hosny, H. Mostafa Kamal Sabry, S. Morad Abdelmohsen, A biophilic approach for optimizing daylighting performance
and views-out in intensive care units using combined light shelf, Eng. Res. J. 165 (2020) 57–77, [Link]
[19] M. Ayoub, A review on machine learning algorithms to predict daylighting inside buildings, Sol. Energy 202 (2020) 249–275, [Link]
solener.2020.03.104.
[20] C.-L. Lorenz, A.B. Spaeth, C. Bleil De Souza, M. Packianather, Machine Learning in Design Exploration: an Investigation of the Sensitivities of ANN-Based
Daylight Predictions, CAAD Futur Conf Proc, 2019.
[21] C.L. Lorenz, A.B. Spaeth, C. Bleil de Souza, M.S. Packianather, Artificial Neural Networks for parametric daylight design, Architect. Sci. Rev. 63 (2020) 210–221,
[Link]
[22] T. Kazanasmaz, M. Günaydin, S. Binol, Artificial neural networks to predict daylight illuminance in office buildings, Build. Environ. 44 (2009) 1751–1757,
[Link]
[23] Y. Baştanlar, M. Ozuysal, Introduction to machine learning, Methods Mol. Biol. 1107 (2014) 105–128, [Link]
[24] C.L. Lorenz, M. Packianather, A. Benjamin Spaeth, C.B. De Souza, Artificial neural network-based modelling for daylight evaluations, in: Proc. 2018 Symp.
Simul. Archit. Urban Des. (SimAUD 2018) 50, Society for Modeling and Simulation International (SCS), 2018, pp. 8–15, [Link]
[Link].002.
[25] C.L. Lorenz, A.B. Spaeth, C. Bleil De Souza, M. Packianather, Input feature optimization for ANN models predicting daylight in buildings, CEUR Workshop Proc.
2394 (2019) 1–11.
[26] C.—L. Lorenz, W. Jabi, Predicting Daylight Autonomy Metrics Using Machine Learning, 2017.
[27] Solemma LLC | DIVA n.d. [Link] (Accessed 4 July 2020).
[28] A. Tabadkani, S. Banihashemi, M.R. Hosseini, Daylighting and visual comfort of oriental sun responsive skins: a parametric analysis, Build Simul. 11 (2018)
663–676, [Link]
[29] A. Nabil, J. Mardaljevic, Useful daylight illuminance: a new paradigm for assessing daylight in buildings, Light. Res. Technol. 37 (2005) 41–59, [Link]
10.1191/1365782805li128oa.
[30] A.A.S. Bahdad, S.F.S. Fadzil, N. Taib, Optimization of daylight performance based on controllable light-shelf parameters using genetic algorithms in the tropical
climate of Malaysia, J. Daylighting 7 (2020) 122–136, [Link]
[31] C.F. Reinhart, J. Wienold, The daylighting dashboard - a simulation-based design analysis for daylit spaces, Build. Environ. 46 (2011) 386–396, [Link]
10.1016/[Link].2010.08.001.
[32] A. Nabil, J. Mardaljevic, Useful daylight illuminances: a replacement for daylight factors, Energy Build 38 (2006) 905–913, [Link]
enbuild.2006.03.013.
[33] J. Mardaljevic, M. Andersen, N. Roy, J. Christoffersen, Daylighting metrics: is there a relation between useful daylight illuminance and daylight glare
probability?, 2012, pp. 189–196. Ibpsa-Engl Bso12.
[34] D.-K. Bui, T.N. Nguyen, T.D. Ngo, H. Nguyen-Xuan, An artificial neural network (ANN) expert system enhanced with the electromagnetism-based firefly
algorithm (EFA) for predicting the energy consumption in buildings, Energy 190 (2020) 116370, [Link]
[35] C. Buratti, M. Barbanera, D. Palladino, An original tool for checking energy performance and certification of buildings by means of Artificial Neural Networks,
Appl. Energy 120 (2014) 125–132, [Link]
[36] S.L. Wong, K.K.W. Wan, T.N.T. Lam, Artificial neural networks for energy analysis of office buildings with daylighting, Appl. Energy 87 (2010) 551–557,
[Link]
[37] S. Zhou, D. Liu, Prediction of daylighting and energy performance using artificial neural network and support vector machine, Am. J. Civ. Eng. Architect. 3
(2015) 1–8, [Link] 2015;3:1–8.
[38] R. Hecht-Nielsen, Theory of the backpropagation neural Network**Based on “nonindent” by Robert Hecht-Nielsen, which appeared in, in: Proceedings of the
International Joint Conference on Neural Networks 1, 593–611, June 1989. © 1989 IEEE. Academic Press, Inc., 1992, [Link]
741252-8.50010-8.
[39] D.T.T. Do, H. Nguyen-Xuan, J. Lee, Material optimization of tri-directional functionally graded plates by using deep neural network and isogeometric multimesh
design approach, Appl. Math. Model. 87 (2020) 501–533, [Link]

18
L. Le-Thanh et al. Journal of Building Engineering 52 (2022) 104374

[40] \climatewebsite\WMO_Region_2_Asia\VNM_Vietnam n.d.[Link] (accessed October

20, 2020).
[41] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfittin, J. Mach. Learn. Res.
15 (2014) 1929–1958.
[42] J.M. Twomey, A.E. Smith, Performance measures, consistency, and power for artificial neural network models, Math. Comput. Model. 21 (1995) 243–258,
[Link]

Common questions