+1(978)310-4246 credencewriters@gmail.com
  

Im asking to create a powerpoint slides and a doc file ( 3 pages single spaced) explaining the research paper based on the powerpoint slides format.

Presentation Format

Introduction

What problem is discussed in the paper and why do we care?

What data was used in the study?

Methods/Techniques. How were these data analyzed?

What were their primary findings?

What are the implications of these findings?

Challenges/Shortcomings. Are there potential problems with this study, or are their assumptions reasonable. As scientists we are naturally skeptical of others work. Which parts of this study are you skeptical about and why?

Conclusion

Note: I have provide an example of a powerpoint slide and a doc file which gives an idea of what im asking for. Also, the powerpoint must include at least 3 figures with an explanation of each one in the doc file.

On-Site Method for Early
Earthquake Warning
“A P-Wave Based Systems”
Mundhir Alfarsi
Introduction
â–ª In this presentation we will discuss the
importance of EEWs devices ,challenges
and implications of its use.
â–ª According to Colombeli(2015),Early
earthquake has been a concept that is
being integrated in most countries
nowadays. This mechanics are adopted to
counter the destructive effects of
earthquakes.
â–ª Research done show that the EEWs
monitoring devices are developed to
detect ongoing earthquake triggers before
hand. After detecting potential earthquake
movement, it sends warnings to target
areas for early mitigation plans.
â–ª These systems were first developed by the
National Research Institute for Earth
Science and Disaster Prevention in
conjunction with Japan Meteorological
Agency in the year 2007.
1.
Problem:
What problem is discussed in
the paper and why do we care.
â–ª Defining the effectiveness of
EEWs located in a given site to
detect earthquake waves and
ground motions that are far from
site.
â–ª Defining thresholds for alarm
warnings.
Data:
2.What data was used in the
study?
â–ª Data obtained from previous
research events in Japan.
â–ª 76 events are randomly selected
to form a methodology.
â–ª 73 events out of this are used to
draw empirical
conclusions,while 3 events are
used to validate the data
obtained.
Figure 1: Events Data Distribution
Methods/Techniques:
3.How were there data analyzed?
â–ª Quantitative method of data
collection was used to collect data
samples.
Figure 2: Example of vertical
components.
Findings:
Figure 3: Overall performance of the
method used.
4. What were the primary
findings?
â–ª The primary findings, show that
when all variables are used cocurrently a more reliable and
significant alarm performance is
achieved.
â–ª It also shows the ability of the
alarm systems to rapidily provide
warnings.
Challenges/shortcoming:
6. Are there potential problems
with this study or are their
assumptions reasonable. As
scientists we are naturally skeptical
of others work. Which parts of this
study are you skeptical about
&why
â–ª . The challenge of estimating or
predicting ground shaking levels
with outmost certainty.
â–ª .S waves variables not taken into
account when computing
threshold limits of the EEWs
systems.
Conclusion
â–ª In conclusion, we note despite the systems shortcomings in detecting larger
events. It also has some advantages rather than the main one which is detecting
possible ground motions within a given radius of targeted areas. The EEWs can
also be manipulated to fit user’s discretion.
â–ª Secondly, the proposed methodology can time performance with shorter or
longer portions for warning and declaration when compared to other standard
approaches in use.
References
â–ª Colombelli.S,(2015),A P Wave On-Site Method for Earthquake :AGU
Publications.pg1390-1398.
Intro:
Hi everyone, today im going to talk about “A p-wave based system on site method for early
earthquake warning” which was discussed in a geophysics research paper. This presentation will
discuss the importance of early earthquake warnings devices, challenges and implications of its
use.
Earthquakes are natural hazard that occur as a result of movements within the earth crust or a
volcanic action which creates seismic waves. Earthquakes can cause huge destructive effects that
can destroy a whole city. Thus, the paper discusses a developed technique to avoid the
destructive effects of earthquakes. According to the paper, Early earthquake has been a concept
that is being integrated in most countries nowadays. These mechanics are adopted to counter the
destructive effects of earthquakes.
The technique discussed in the paper is developed early earthquake warnings system which are
real time, seismic monitoring systems that detect an ongoing earthquake and provide a warning
to the target area before the arrival of most destructive wave for early mitigation plans.
These systems were first developed by the National Research Institute for Earth Science and
Disaster Prevention in conjunction with Japan Meteorological Agency in 2007.
Problem:
The problem discussed in the paper is defining the effectiveness of these developed early
earthquake warnings system which are located in a given site to detect earthquake waves and
ground motions that are far from the site and defining thresholds for alarm warnings.
According to the paper, one of the main problems being the adverse economic and physical
losses that cities affected by earthquake triggers face. Some of the countries that suffer such
triggers are Japan and Mexico as a result of their sensitive geological location.
Data:
The data used in this research were obtained from previous research events in Japan. The data
used are from 76 events with a magnitude of 4.0 -9.0 that took place in Japan.
73 events out of the 76 were used as controls to establish empirical correlations, and the other 3
were used to validate the data that was obtained by empirical calculations.
Finally, a radius of 500km was secured as a sample for each event for approximately 12,792 km
in total.
The map shows the distribution of stations and the epicentral location of the selected events
represented by gray dots and the green stars are the earthquakes used for the calibration, while
red stars are the scenario events.
Method:
The method used in this paper were based on two primary keys. The first one was continuous
measurement of three peak amplitude parameters (the initial peak of displacement, Pd, velocity,
Pv, and acceleration, Pa) on the vertical component of ground motion recordings, and the second
key was the use of an empirical combination of the three ground motion parameters for
predicting the ensuing PGV at the same site. These observed parameters are compared to
threshold values and converted into a single, dimensionless variable. A local alert level is issued
as soon as the empirical combination exceeds a given threshold.
The techniques used were quantitative in nature, calculations were based on earthquake waves
intensity in terms of displacement(d), the distance each event covered, the Velocity(v), the speed
per second of each successive wave and acceleration (a).
As for the threshold, which is the intensity or magnitude limit for each successive event to occur
represented by Px(l) and Px(h), indicating the lower and higher thresholds for the variable x.
For each event the variable x is computed by summing up the values as shown in these
equations:
Wt=0
W(x)t=1/3 {Px(t)-PxL}/(Pxh-Pxl)
Wx(t)=1/3
Here we have a figure where it’s an example of vertical components. From top to bottom, an
example of the vertical component of acceleration (black), velocity (blue), and displacement
(orange) signals. The initial peak amplitude parameters Pd, Pv, and Pa are measured as the
absolute maximum of displacement waveform on the early portion of P wave. The threshold
values on each parameter are schematically shown as a black, dashed line on each record.
Examples of Wa, Wv, and Wd as a function of time are also shown with a solid black, blue, and
orange lines, respectively. Each
step on Wa, Wv, and Wd corresponds to an increase of the peak amplitude parameters (as
absolute value) on the ground motion records. Finally, the bottom plot shows the cumulative
logical variable Wt as a function of time (purple curve).
Findings:
From the data collected and the computations done, the primary findings were extracted from the
percentages obtained for each successful alarm triggers. Figure 3 represents the performance of
the method used in this paper. Figures (a, d) are histograms showing the performance of the
system in terms of successful alert represented by (dark green bar), successful no alert
represented by (light green bar), false alert represented by (yellow bar), and missed alert
represented by (red bar) for the two different intensity levels. Figures (b, e) represent alert times
as a function of distance where the black line is the theoretical alert time for the fixed p-wave
time window on-site system (3 s). Finally, figures (c, f ) represent lead times as a function of
distance where the black line is the theoretical lead time for the fixed p-wave time window onsite system while the green line is the best fit regression of the observed lead time as a function
of distance.
The Results from figure 3 show that when all variables are used a significant alarm performance
is achieved. It also shows the ability of the alarm systems to rapidly provide warnings.
Challenges/shortcoming:
One of the challenges in my point of view is estimating or predicting ground shaking levels with
outmost certainty since the paper focuses only on p-wave time window variables for the
computation of intensity while ignoring the S waves variables when computing threshold limits
of the early earthquake warning systems. I believe that if S-waves variables not taken into
account at initial stages especially when larger events take place the alarms might not be able to
detect them.
Conclusion:
In conclusion, we note despite the systems shortcomings in detecting larger events. It also has
some advantages rather than the main one which is detecting possible ground motions within a
given radius of targeted areas. The EEWs can also be manipulated to fit user’s discretion.
Secondly, the proposed methodology can time performance with shorter or longer portions for
warning and declaration when compared to other standard approaches in use.
On-Site Method for Early
Earthquake Warning
“A P-Wave Based Systems”
Introduction
â–ª In this presentation we will discuss the
importance of EEWs devices ,challenges
and implications of its use.
â–ª According to Colombeli(2015),Early
earthquake has been a concept that is
being integrated in most countries
nowadays. This mechanics are adopted to
counter the destructive effects of
earthquakes.
â–ª Research done show that the EEWs
monitoring devices are developed to
detect ongoing earthquake triggers before
hand. After detecting potential earthquake
movement, it sends warnings to target
areas for early mitigation plans.
â–ª These systems were first developed by the
National Research Institute for Earth
Science and Disaster Prevention in
conjunction with Japan Meteorological
Agency in the year 2007.
1.
Problem:
What problem is discussed in
the paper and why do we care.
â–ª Defining the effectiveness of
EEWs located in a given site to
detect earthquake waves and
ground motions that are far from
site.
â–ª Defining thresholds for alarm
warnings.
Data:
2.What data was used in the
study?
â–ª Data obtained from previous
research events in Japan.
â–ª 76 events are randomly selected
to form a methodology.
â–ª 73 events out of this are used to
draw empirical
conclusions,while 3 events are
used to validate the data
obtained.
Figure 1: Events Data Distribution
Methods/Techniques:
3.How were there data analyzed?
â–ª Quantitative method of data
collection was used to collect data
samples.
Figure 2: Example of vertical
components.
Findings:
Figure 3: Overall performance of the
method used.
4. What were the primary
findings?
â–ª The primary findings, show that
when all variables are used cocurrently a more reliable and
significant alarm performance is
achieved.
â–ª It also shows the ability of the
alarm systems to rapidily provide
warnings.
Challenges/shortcoming:
6. Are there potential problems
with this study or are their
assumptions reasonable. As
scientists we are naturally skeptical
of others work. Which parts of this
study are you skeptical about
&why
â–ª . The challenge of estimating or
predicting ground shaking levels
with outmost certainty.
â–ª .S waves variables not taken into
account when computing
threshold limits of the EEWs
systems.
Conclusion
â–ª In conclusion, we note despite the systems shortcomings in detecting larger
events. It also has some advantages rather than the main one which is detecting
possible ground motions within a given radius of targeted areas. The EEWs can
also be manipulated to fit user’s discretion.
â–ª Secondly, the proposed methodology can time performance with shorter or
longer portions for warning and declaration when compared to other standard
approaches in use.
References
â–ª Colombelli.S,(2015),A P Wave On-Site Method for Earthquake :AGU
Publications.pg1390-1398.
SEISMIC FACIES ANALYSIS USING MACHINE-LEARNING
T. Wrona1, Indranil Pan2, R.L. Gawthorpe3 and H. Fossen4
Right Running Head: MACHINE-LEARNING BASED FACIES ANALYSIS
1
Department of Earth Science, University of Bergen, Allégaten 41, N-5007 Bergen, Norway.
E-mail: thilo.wrona@uib.no
2
Department of Earth Science and Engineering, Imperial College, Prince Consort Road, London,
SW7 2BP, UK. E-mail: indranilpan@gmail.com
3
Department of Earth Science, University of Bergen, Allégaten 41, N-5007 Bergen, Norway.
E-mail: rob.gawthorpe@uib.no
4
Department of Earth Science, University of Bergen, Allégaten 41, N-5007 Bergen, Norway.
E-mail: haakon.fossen@uib.no
ABSTRACT
Seismic interpretations are, by definition, subjective and often require significant time and
expertise from the interpreter. We demonstrate that machine-learning techniques can help
address these problems by performing seismic facies analyses in a rigorous, repeatable way. For
this purpose, we use state-of-the-art 3D broadband seismic reflection data of the northern North
Sea. Our workflow includes five basic steps. First, we extract seismic attributes to highlight
features in the data. Second, we perform a manual seismic facies classification on 10 000
examples. Third, we use some of these examples to train a range of models to predict seismic
facies. Fourth, we analyze the performance of these models on the remaining examples. Fifth, we
select the ‘best’ model (i.e., highest accuracy) and apply it to a seismic section. As such, we
highlight that machine-learning techniques can increase the efficiency of seismic facies analyses.
INTRODUCTION
Seismic reflection data is a key source of information in numerous fields of geoscience,
including sedimentology and stratigraphy (e.g., Vail, 1987; Posamentier, 2004), structural
geology (Baudon and Cartwright, 2008; Jackson et al., 2014), geomorphology (e.g., Posamentier
and Kolla, 2003; Cartwright and Huuse, 2005; Bull et al., 2009) and volcanology (e.g., Hansen et
al., 2004; Planke et al., 2005; Magee et al., 2013). However, the often subjective and non-unique
interpretation of seismic reflection data has led to longstanding debates based on contrasting
geological interpretations of the same or similar data sets (e.g., Stewart and Allen, 2002;
Underhill, 2004). Moreover, seismic interpretations require significant amounts of time,
experience, and expertise from interpreters (e.g., Bond et al., 2012; Bond, 2015; Macrae et al.,
2016). We believe that machine-learning techniques can help the interpreters reduce some of
these problems associated with seismic facies analyses.
Machine-learning describes a set of computational methods that are able to learn from
data to make accurate predictions. Previous applications of machine-learning to seismic
reflection data focus on the detection of geological structures, such as faults and salt bodies (e.g.,
Hale, 2013; Zhang et al., 2014; Guillen et al., 2015; Araya-Polo et al., 2017; Huang et al., 2017)
and unsupervised seismic facies classification, where an algorithm chooses the number and types
of facies (e.g., Coléou et al., 2003; de Matos et al., 2006). While early studies primarily used
clustering algorithms to classify seismic data (e.g., Barnes and Laughlin, 2002; Coléou et al.,
2003), recent studies focus on the application of artificial neural networks (e.g., de Matos et al.,
2006; Huang et al., 2017). To demonstrate the strength of these advanced algorithms, this study
compares 20 different classification algorithms (e.g., K-nearest neighbor, Support Vector
Machines and Artificial Neural Networks).
While these unsupervised classification algorithms are, in theory, able to identify the
main seismic facies in a given data set, in practice it can be difficult to correlate these
automatically classified facies to existing geological units. This correlation can be done by
visualizing facies in a lower dimensional space (e.g., Gao, 2007) or self-organized maps (e.g.,
Coléou et al., 2003). As an alternative, we introduce a simple supervised machine-learning
workflow, where the user can define the number and type of seismic facies used for
classification. This approach avoids the correlation by allowing the user to adapt the workflow to
a given problem, where seismic facies can be based on existing geological units.
To demonstrate the advantages of this approach, we describe the application of
supervised machine-learning to a seismic facies analysis using 3D broadband seismic reflection
data of the northern North Sea. Our workflow consists of five basic steps. First, we extract
features from the data by calculating 15 seismic attributes. Second, we generate training data by
manually sorting 1000 examples into four facies. Third, we train 20 models to classify seismic
facies using some of these examples. Fourth, we assess the performance of these models using
the remaining examples. Fifth, we select the ‘best’ model based on its performance and apply it
to a seismic section. Our results demonstrate that machine-learning algorithms are able to
perform seismic facies analyses, which are crucial to map sedimentary sequences, structural
elements, and fluid contacts.
3D SEISMIC REFLECTION DATA
Figure 1: Seismic section (courtesy of CGG) used for automated seismic facies analysis.
This study uses state-of-the-art 3D broadband seismic reflection data (CGG
BroadseisTM) of the northern North Sea (Figure 1). The data covers an area of 35,410 km2 and
was acquired using a series of up to 8-km-long streamers towed ~40 m deep. The data recording
extends to 9 s with a time sampling of 4 ms. Broadseis data covers a wide range of frequencies
reaching from 2.5 to 155 Hz (Firth et al., 2014). The binning size was 12.5 × 18.75 m. The data
was 3-D true amplitude Kirchhoff pre-stack time migrated. The seismic volume was zero-phase
processed with SEG normal polarity; i.e., a positive reflection (white) corresponds to an
acoustic-impedance increase with depth. The data was time-migrated using a pre-stack
algorithm.
MANUAL INTERPRETATION
Supervised machine-learning requires a subset of the data set for training and testing
models. We therefore select a reasonable number (10 000) of examples to perform a manual
seismic facies classification. This number represents a trade-off between the time required for
manual classification and the achieved model accuracy (>0.95). After testing different sizes
(from 10 × 10 to 500 × 500 samples), we selected an example size of 100 × 100 samples (Figure
2), which results in high model accuracies (>0.95). The classification follows standard schemes
for seismic facies developed based on numerous studies (e.g., Brown, 2004; Bacon et al., 2007;
Kearey et al., 2009). The four facies (i.e., classes) that we use for classification are: A)
continuous, horizontal reflections; B) continuous, dipping reflections; C) discontinuous, crisscrossing reflections; and D) discontinuous, chaotic reflections (Figure 2). These four are
probably the most common basic seismic facies. Since almost all geological structures show at
least one of these facies in seismic reflection data, classifying them accurately would allow us to
map a wide range of structures.
Figure 2: Representative examples of manual seismic facies classification of the four seismic facies chosen
for this study: a) continuous, horizontal reflections; b) continuous, dipping reflections; c) discontinuous,
crisscrossing reflections; and d) discontinuous, chaotic reflections. The horizontal axes show distance in
meters and the vertical axes show two-way traveltime in milliseconds. The number of examples (10 000) is
balanced across classes with each of the four classes containing the same number of examples (2500).
Seismic data courtesy of CGG.
MACHINE-LEARNING
In order to classify seismic facies, we apply a typical machine-learning workflow (e.g.,
Abu-Mostafa et al., 2012). The basic idea of this workflow is to ‘teach’ a model to identify
seismic facies in a seismic section. Our workflow includes the following steps: (1) feature
extraction, (2) training, (3) testing, (4) model selection, and (5) application (Figure 3).
Figure 3: Machine-learning workflow in this study.
Feature extraction
Feature extraction aims to obtain as much information as possible about the object of
investigation. For this purpose, we extract so-called features, i.e., properties, which describe the
object we study. Here, this object is a seismic section (Figure 1) and the features are statistical
properties of seismic attributes inside a moving window. Because seismic attributes have been
specifically designed to highlight certain characteristics of seismic reflection data (see Randen
and Sønneland, 2005; Chopra and Marfurt, 2007), they are well-suited features.
Seismic attributes
Consistent Dip
Highlights
Reflector dip
Cosine of Phase1
Dominant Frequency1
Envelope1
GLCM(I)2
Structural delineations
Frequency content
Bright spots or strong interfaces
Continuity
Instantaneous Bandwidth1
Instantaneous Frequency1
Instantaneous Phase1
Instantaneous Quality1
Local Flatness1
Frequency range
Hydrocarbons, fractures or interfaces
Continuities, faults, terminations or interfaces
Fluid content or fractures
Channels or fractures
Local Structural Dip3
Dip
Maximum Curvature
Discontinuities or distortions
Reflection Intensity
Second Derivative*
Variance (Edge Method)
Impedance contrasts
Continuity
Faults and fractures
Parameters
Output type: Dip and azimuth
Lateral filter radius: 0
Vertical filter radius: 2
Accuracy: 2
AGC length: 25
Window length: 51
Window length: 51
Window length: 51
Algorithm: Energy
Lower amplitude limit: 0.0
Upper amplitude limit: 1.0
Levels: 5
Split: 4
Lateral radius: 4
Vertical radius: 4
Window length: 51
Window length: 51
Window length: 51
Window length: 51
Orientation sigma X-Z: 2.0
Variance sigma X-Z: 2.5
Principal component
Sigma X-Z: 1.5
Vertical radius: 12
Inline/xline radius: 1
Inline range: 5
Crossline range: 5
Vertical smoothing:10
Dip correction: On
Inline scale: 1.5
Crossline scale: 1.5
Vertical scale: 1.5
Plane confidence threshold: 0.6
Dip guided smoothing: On
Table 1: Extracted seismic attributes. References: 1Taner and Sheriff, (1977) and Taner et al., (1979);
2
Haralick et al., (1973), Reed and Hussong, (1989) and Gao, (2003) and 3Randen et al., (2000). *The
Second Derivative attribute was calculated from the original seismic data. These attributes were selected
because they provide sufficient information on different geological and geophysical characteristics of the
data to allow accurate seismic facies predictions.
Figure 4: Calculated seismic attributes sorted in rows according to Table 1 starting with the original seismic section in the top left corner. Seismic
data courtesy of CGG.
After examining all seismic attributes available in Schlumberger Petrel 2015©, we extract 15
attributes, which allow accurate seismic facies predictions (see Table 1, Figure 4). Seismicattribute extraction typically involves non-linear transformations (e.g., Hilbert transformation) of
the original seismic data. As such, we can describe these calculations by:
.
where
is the original data;
are the transformations, and
(1)
are the resulting seismic
attributes, which were normalized.
Although this process provides a value at each point of the data, the nature of the seismic
data requires an additional processing step. The seismic data (and therefore the seismic
attributes) contain numerous small-scale variations, which only in combination form a seismic
facies. This phenomenon is captured by calculating a series of statistics inside a moving window
(100 × 100 samples) from these attributes. These statistics are the features that we use for
machine-learning. Mathematically, we can describe this process as a deconstruction of the
seismic attribute matrices ( ) into a large number of matrices (
) for each window:
(2)
In each window, we calculate a series of statistics, i.e., the features (
):
.
(3)
The statistics, we use, include: (1) the 20th percentile; (2) the 80th percentile; (3) the mean; (4) the
standard deviation; (5) standard error of the mean; (6) the skewness; and (7) the kurtosis.
Regularization
Figure 5: Hyperparameters selected based on a tradeoff between model accuracy and simplicity for different
algorithms during training. Grey areas indicate standard
deviation of accuracy between different folds (i.e., splits)
during the cross-validation
Using a large number of features can result in overfitting where an overly complex model
describes random errors or noise in the data. To avoid overfitting, we regularize our models,
when possible, during training. Training during machine-learning usually involves the
minimization of the in-sample error, i.e., the difference between the predicted (
) and the
actual ( ) result:
.
(4)
Regularization introduces an additional constraint on the set of models:
.
(5)
where
is the regularization parameter and
the penalty function. The regularization
parameter was selected based on a trade-off between model accuracy and simplicity during
training (see Figure 5). While we conduct no explicit feature selection, regularization can be
regarded as an implicit method to constrain features.
Training
In this phase, we train several models to classify seismic facies using training data of our
manual interpretation. Training itself involves the minimization of the in-sample error, i.e., the
difference between the predicted (
) and the known result ( ) (see Eq. 4). Because we
distinguish between four seismic facies, we conduct a multi-class classification where the model
output comprises four discrete classes (A, B, C, and D). While some classifiers can inherently
handle multi-class problems, binary classifiers require one-vs-all or one-vs-one strategies to
predict more than two classes. By covering the most-common algorithms used for multi-class
classification (see Table 2), we are able to compare their performance on this data set (Figure 5).
To improve the performance, we explore different kernels for some of the algorithms (see
Table 2). Classification problems often become easier when we transform a feature (
a high-dimensional space (
). Explicit feature transformations (
) into
) can,
however, be computationally expensive. Kernel functions ( ) allow an implicit use of these
high-dimensional spaces by calculating inner products between feature pair images:
.
(6)
Default settings
Algorithm
Strategy
Adaboost
multi-class
Decision Tree
multi-class
Extra Trees
Kernel
Hyperparameters
Others
Name
Range
Selection
base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm=’S
AMME.R’, random_state=None
Max. depth
[1, 10]
3

criterion=’gini’, splitter=’best’, max_depth=None, min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=Non
e, random_state=None, max_leaf_nodes=None, min_impurity_decrease
=0.0, min_impurity_split=None, class_weight=None, presort=False
Max. depth
[1, 10]
7
multiclass

n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=
2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’a
uto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_
split=None, bootstrap=False, oob_score=False, n_jobs=1, random_state
=None, verbose=0, warm_start=False, class_weight=None
Estimators
[1, 20]
3
Gaussian Process
(Cubic Kernel)
one-vs-all
Cubic
Polynomial
alpha=1e10, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, normalize_y=Fals
e, copy_X_train=True, random_state=None



Gaussian Process
(Radial Basis Function)
one-vs-all
Radial Basis
Function
alpha=1e10, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, normalize_y=Fals
e, copy_X_train=True, random_state=None



Gradient Boosting
multi-class

loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, c
riterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min
_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0,
min_impurity_split=None, init=None, random_state=None, max_features
=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’
auto’
Estimators
[1, 40]
4
K-nearest Neighbor
one-vs-all

radius=1.0, algorithm=’auto’, leaf_size=30, metric=’minkowski’, p=2, metr
ic_params=None, n_jobs=1, **kwargs
Neighbors
[1, 20]
4
Linear Discriminant
Analysis
multi-class

solver=’svd’, shrinkage=None, priors=None, n_components=None, store
_covariance=False, tol=0.0001


Logistical Regression
one-vs-all

penalty=’l2’, dual=False, tol=0.0001, fit_intercept=True, intercept_scaling
=1, class_weight=None, random_state=None, solver=’liblinear’, max_iter
=100, multi_class=’ovr’, verbose=0, warm_start=False, n_jobs=1
Regularization
[10e-5, 10e5]
1
Neural Network
(Identity)
one-vs-all

hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, batch_size=
’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5,
max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=
False, warm_start=False, momentum=0.9, nesterovs_momentum=True,
early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99
9, epsilon=1e-08
Regularization
[10e-5, 10e5]
1

Neural Network
(Logistic)
one-vs-all

hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, batch_size=
’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5,
max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=
False, warm_start=False, momentum=0.9, nesterovs_momentum=True,
early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99
9, epsilon=1e-08
Regularization
[10e-5, 10e5]
0.1
Neural Network
(Rectified Linear Unit)
one-vs-all

hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, batch_size=
’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5,
max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=
False, warm_start=False, momentum=0.9, nesterovs_momentum=True,
early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99
9, epsilon=1e-08
Regularization
[10e-5, 10e5]
1
Neural Network
(Hyperbolic Tangent)
one-vs-all

hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, batch_size=
’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5,
max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=
False, warm_start=False, momentum=0.9, nesterovs_momentum=True,
early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99
9, epsilon=1e-08
Regularization
[10e-5, 10e5]
1
Quadratic Discriminant
Analysis
one-vs-all

priors=None, store_covariance=False, tol=0.0001, store_covariances=N
one
Regularization
[-2, 2]
0
Random Forest
multi-class

criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_l
eaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_no
des=None, min_impurity_decrease=0.0, min_impurity_split=None, bootst
rap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0,
warm_start=False, class_weight=None
Estimators
[1, 50]
3
Stochastic Gradient
Descent
one-vs-all

loss=’hinge’, penalty=’l2’, l1_ratio=0.15, fit_intercept=True, max_iter=No
ne, tol=None, shuffle=True, verbose=0, epsilon=0.1, n_jobs=1, random_
state=None, learning_rate=’optimal’, eta0=0.0, power_t=0.5, class_weig
ht=None, warm_start=False, average=False, n_iter=None
Regularization
[10e-5, 10e5]
10e-4
Support Vector Machine
(Cubic)
one-vsone
Cubic
Polynomial
degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, t
ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite
r=-1, decision_function_shape=’ovr’, random_state=None
Regularization
[10e-5, 10e5]
10e-5
Support Vector Machine
(Linear)
one-vsone
Linear
Function
degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, t
ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite
r=-1, decision_function_shape=’ovr’, random_state=None
Regularization
[10e-5, 10e5]
1
Support Vector Machine
(Quadratic)
one-vsone
Quadratic
Polynomial
degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, t
ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite
r=-1, decision_function_shape=’ovr’, random_state=None
Regularization
[10e-5, 10e5]
10e-4
Support Vector Machine
(Radial Basis Function)
one-vsone
Radial Basis
Function
degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, t
ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite
r=-1, decision_function_shape=’ovr’, random_state=None
Regularization
[10e-5, 10e5]
1
Table 2: Algorithms, default- and hyperparameters in this study. The selected parameters were selected
based on a trade-off between model accuracy and simplicity (see Figure 5). For more information, we
would like to refer the reader to the scikit-learn package (Pedregosa et al., 2011).
As such, kernels allow us to use high-dimensional feature spaces without specifying them or the
explicit transformation. Here, we use polynomial (
) and radial basis kernel functions (
):
(7)
(8)
in combination with Support Vector Machine, Gaussian Process, and Neural Network classifiers
(see Table 2).
Cross-validation
During validation, we determine the model performances on yet unseen data. A simple
holdout validation splits the data ( ) in two subsets: one for training (
(
) and one for testing
). This approach, however, leads to a dilemma, as we would like to maximize both subsets:
the training-set to generate well-constrained models, and the test-set to obtain reliable estimates
of model performance. This dilemma is resolved by cross-validation, i.e., splitting data multiple
times and averaging performance estimates between folds. We apply a ten-fold stratified crossvalidation where data is split into training (90% of the data) and test set (10% of the data) ten
times while preserving the percentage of examples of each class. To visualize model
performances, we calculate an average confusion matrix of each model (Figure 6).
Figure 6: Confusion matrices of trained models showing average number of correct classifications on
diagonal and average number of incorrect classifications off diagonal. Averages result from ten-fold crossvalidation. Note that the classes were balanced, so that the confusion matrices visualize class-wise model
accuracies.
#
Model
Facies A
Facies B
Facies C
Facies D
Overall
Pre
Rec
f1
Pre
Rec
f1
Pre
Rec
f1
Pre
Rec
f1
Pre
Rec
f1
Acc
1
Support Vector Machine (Cubic)
0.995
0.985
0.990
0.975
0.992
0.983
0.992
0.974
0.983
0.971
0.982
0.977
0.983
0.983
0.983
0.983±0.004
2
Support Vector Machine
(Radial Basis Function)
0.995
0.984
0.989
0.974
0.994
0.984
0.991
0.973
0.982
0.973
0.980
0.977
0.983
0.983
0.983
0.983±0.004
3
Support Vector Machine (Quadratic)
0.993
0.984
0.989
0.974
0.992
0.983
0.992
0.972
0.982
0.971
0.982
0.976
0.983
0.982
0.982
0.983±0.004
4
Support Vector Machine (Linear)
0.993
0.984
0.988
0.973
0.988
0.980
0.991
0.975
0.983
0.970
0.979
0.974
0.982
0.981
0.981
0.982±0.004
5
K-nearest Neighbor
0.991
0.987
0.989
0.979
0.988
0.983
0.985
0.974
0.979
0.972
0.977
0.974
0.982
0.981
0.981
0.982±0.003
6
Neural Network (Rectified Linear Unit)
0.994
0.982
0.988
0.966
0.992
0.979
0.988
0.976
0.982
0.975
0.970
0.972
0.980
0.980
0.980
0.980±0.005
7
Neural Network (Logistic Activation Function)
0.990
0.984
0.987
0.972
0.982
0.977
0.988
0.975
0.982
0.968
0.975
0.971
0.980
0.979
0.979
0.980±0.004
8
Extra Tree
0.986
0.985
0.986
0.975
0.982
0.979
0.981
0.974
0.977
0.969
0.971
0.970
0.978
0.978
0.978
0.978±0.004
9
Random Forest
0.987
0.983
0.985
0.974
0.982
0.978
0.982
0.974
0.977
0.969
0.972
0.971
0.978
0.978
0.978
0.978±0.004
10
Neural Network
(Hyperbolic Tangent Activation Function)
0.990
0.984
0.987
0.969
0.983
0.976
0.983
0.976
0.979
0.970
0.967
0.968
0.978
0.977
0.977
0.978±0.003
11
Gradient Boosting
0.986
0.983
0.984
0.973
0.979
0.976
0.980
0.975
0.977
0.969
0.969
0.969
0.977
0.977
0.977
0.977±0.003
12
Logistic Regression
0.984
0.980
0.982
0.968
0.977
0.973
0.982
0.976
0.979
0.970
0.969
0.969
0.976
0.976
0.976
0.976±0.003
13
Quadratic Discriminant Analysis
0.984
0.985
0.985
0.970
0.983
0.976
0.947
0.987
0.966
0.985
0.928
0.956
0.971
0.971
0.971
0.971±0.004
14
Decision Tree
0.986
0.976
0.981
0.963
0.981
0.972
0.978
0.962
0.970
0.957
0.965
0.961
0.971
0.971
0.971
0.971±0.003
15
Neural Network (Identity Activation Function)
0.971
0.973
0.972
0.957
0.962
0.959
0.983
0.976
0.979
0.968
0.966
0.967
0.970
0.969
0.969
0.970±0.005
16
Adaboost
0.977
0.972
0.974
0.966
0.972
0.968
0.972
0.964
0.968
0.961
0.964
0.962
0.969
0.968
0.968
0.969±0.011
17
Linear Discriminant Analysis
0.980
0.968
0.974
0.951
0.975
0.963
0.983
0.958
0.970
0.954
0.964
0.959
0.967
0.967
0.967
0.967±0.004
18
Gaussian Process (Cubic)
0.973
0.972
0.972
0.965
0.956
0.960
0.973
0.964
0.968
0.943
0.956
0.948
0.964
0.962
0.962
0.964±0.004
19
Stochastic Gradient Descent
0.957
0.974
0.965
0.957
0.940
0.948
0.993
0.945
0.969
0.934
0.977
0.955
0.960
0.959
0.959
0.960±0.004
20
Gaussian Process (Radial Basis Function)
0.762
0.504
0.525
0.587
0.724
0.628
0.889
0.752
0.789
0.701
0.660
0.647
0.735
0.660
0.647
0.735±0.127
Table 3: Parameters describing model performance on each class (i.e., facies) and overall, i.e., precision, recall, f1-score, and support as well as
accuracy. Models are sorted by accuracy. Standard deviations of model accuracies between different folds of cross-validation are listed in the last
column. Note that models are balanced with a support of 2500 for each class.
To quantify the model performance, we calculate: (1) precision, (2) recall and (3) f1score, and their averages for each model (see Table 3). Precision describes the ability of
classifiers to predict classes correctly. Recall (or sensitivity) describes the ability of classifiers to
find all examples of a class. F1-score is an equally weighted harmonic mean of precision and
recall, and support is simply the number of examples of each class. Furthermore, we calculate
the average accuracy of each model and determine its standard deviation between folds (see
Table 3). Note that regularization, training, and cross-validation were implemented in Python
using the scikit-learn package (Pedregosa et al., 2011).
Model selection
Model selection is based on generalization performance of trained models on test data. In
our case, the model using a support vector machine with a cubic kernel function shows the
highest accuracy, precision, and recall out of all models (see Table 3, Figure 7). This means that
this model does not only classify seismic facies most accurately, but it is also the best at avoiding
incorrect classifications (Figure 6).
Application
After the model selection, it is recommended to train the best model again using the
entire data set available, i.e., training plus test data (Abu-Mostafa et al., 2012). This final model
is subsequently applied to the entire seismic section (Figure 7).
Figure 7: a) Seismic section and b) final classification result. Seismic data courtesy of CGG.
CLASSIFICATION RESULTS
Test
Test results, obtained during cross-validation, include confusion matrices and descriptive
metrics. Confusion matrices visualize the precision of models for each class (Figure 6). When
the model predicts the correct class, the sample contributes to the diagonal of the confusion
matrix. When the model predicts the wrong class, the sample contributes to the cells off
diagonal. The first element of the confusion matrix (top left) shows the precision for the first
class (i.e., horizontal); the second element (first row, second column) shows the percentage of
samples classified into the second class (i.e., dipping) despite belonging to the first class (i.e.,
horizontal) and so on. As such, confusion matrices show how well each model predicts each
class. In general, the observed variations between models and classes are minor (
Purchase answer to see full
attachment

error: Content is protected !!