Im asking to create a powerpoint slides and a doc file ( 3 pages single spaced) explaining the research paper based on the powerpoint slides format.

Presentation Format

Introduction

What problem is discussed in the paper and why do we care?

What data was used in the study?

Methods/Techniques. How were these data analyzed?

What were their primary findings?

What are the implications of these findings?

Challenges/Shortcomings. Are there potential problems with this study, or are their assumptions reasonable. As scientists we are naturally skeptical of others work. Which parts of this study are you skeptical about and why?

Conclusion

Note: I have provide an example of a powerpoint slide and a doc file which gives an idea of what im asking for. Also, the powerpoint must include at least 3 figures with an explanation of each one in the doc file.

On-Site Method for Early

Earthquake Warning

Ã¢â‚¬Å“A P-Wave Based SystemsÃ¢â‚¬Â

Mundhir Alfarsi

Introduction

Ã¢â€“Âª In this presentation we will discuss the

importance of EEWs devices ,challenges

and implications of its use.

Ã¢â€“Âª According to Colombeli(2015),Early

earthquake has been a concept that is

being integrated in most countries

nowadays. This mechanics are adopted to

counter the destructive effects of

earthquakes.

Ã¢â€“Âª Research done show that the EEWs

monitoring devices are developed to

detect ongoing earthquake triggers before

hand. After detecting potential earthquake

movement, it sends warnings to target

areas for early mitigation plans.

Ã¢â€“Âª These systems were first developed by the

National Research Institute for Earth

Science and Disaster Prevention in

conjunction with Japan Meteorological

Agency in the year 2007.

1.

Problem:

What problem is discussed in

the paper and why do we care.

Ã¢â€“Âª Defining the effectiveness of

EEWs located in a given site to

detect earthquake waves and

ground motions that are far from

site.

Ã¢â€“Âª Defining thresholds for alarm

warnings.

Data:

2.What data was used in the

study?

Ã¢â€“Âª Data obtained from previous

research events in Japan.

Ã¢â€“Âª 76 events are randomly selected

to form a methodology.

Ã¢â€“Âª 73 events out of this are used to

draw empirical

conclusions,while 3 events are

used to validate the data

obtained.

Figure 1: Events Data Distribution

Methods/Techniques:

3.How were there data analyzed?

Ã¢â€“Âª Quantitative method of data

collection was used to collect data

samples.

Figure 2: Example of vertical

components.

Findings:

Figure 3: Overall performance of the

method used.

4. What were the primary

findings?

Ã¢â€“Âª The primary findings, show that

when all variables are used cocurrently a more reliable and

significant alarm performance is

achieved.

Ã¢â€“Âª It also shows the ability of the

alarm systems to rapidily provide

warnings.

Challenges/shortcoming:

6. Are there potential problems

with this study or are their

assumptions reasonable. As

scientists we are naturally skeptical

of others work. Which parts of this

study are you skeptical about

&why

Ã¢â€“Âª . The challenge of estimating or

predicting ground shaking levels

with outmost certainty.

Ã¢â€“Âª .S waves variables not taken into

account when computing

threshold limits of the EEWs

systems.

Conclusion

Ã¢â€“Âª In conclusion, we note despite the systems shortcomings in detecting larger

events. It also has some advantages rather than the main one which is detecting

possible ground motions within a given radius of targeted areas. The EEWs can

also be manipulated to fit user’s discretion.

Ã¢â€“Âª Secondly, the proposed methodology can time performance with shorter or

longer portions for warning and declaration when compared to other standard

approaches in use.

References

Ã¢â€“Âª Colombelli.S,(2015),A P Wave On-Site Method for Earthquake :AGU

Publications.pg1390-1398.

Intro:

Hi everyone, today im going to talk about Ã¢â‚¬Å“A p-wave based system on site method for early

earthquake warningÃ¢â‚¬Â which was discussed in a geophysics research paper. This presentation will

discuss the importance of early earthquake warnings devices, challenges and implications of its

use.

Earthquakes are natural hazard that occur as a result of movements within the earth crust or a

volcanic action which creates seismic waves. Earthquakes can cause huge destructive effects that

can destroy a whole city. Thus, the paper discusses a developed technique to avoid the

destructive effects of earthquakes. According to the paper, Early earthquake has been a concept

that is being integrated in most countries nowadays. These mechanics are adopted to counter the

destructive effects of earthquakes.

The technique discussed in the paper is developed early earthquake warnings system which are

real time, seismic monitoring systems that detect an ongoing earthquake and provide a warning

to the target area before the arrival of most destructive wave for early mitigation plans.

These systems were first developed by the National Research Institute for Earth Science and

Disaster Prevention in conjunction with Japan Meteorological Agency in 2007.

Problem:

The problem discussed in the paper is defining the effectiveness of these developed early

earthquake warnings system which are located in a given site to detect earthquake waves and

ground motions that are far from the site and defining thresholds for alarm warnings.

According to the paper, one of the main problems being the adverse economic and physical

losses that cities affected by earthquake triggers face. Some of the countries that suffer such

triggers are Japan and Mexico as a result of their sensitive geological location.

Data:

The data used in this research were obtained from previous research events in Japan. The data

used are from 76 events with a magnitude of 4.0 -9.0 that took place in Japan.

73 events out of the 76 were used as controls to establish empirical correlations, and the other 3

were used to validate the data that was obtained by empirical calculations.

Finally, a radius of 500km was secured as a sample for each event for approximately 12,792 km

in total.

The map shows the distribution of stations and the epicentral location of the selected events

represented by gray dots and the green stars are the earthquakes used for the calibration, while

red stars are the scenario events.

Method:

The method used in this paper were based on two primary keys. The first one was continuous

measurement of three peak amplitude parameters (the initial peak of displacement, Pd, velocity,

Pv, and acceleration, Pa) on the vertical component of ground motion recordings, and the second

key was the use of an empirical combination of the three ground motion parameters for

predicting the ensuing PGV at the same site. These observed parameters are compared to

threshold values and converted into a single, dimensionless variable. A local alert level is issued

as soon as the empirical combination exceeds a given threshold.

The techniques used were quantitative in nature, calculations were based on earthquake waves

intensity in terms of displacement(d), the distance each event covered, the Velocity(v), the speed

per second of each successive wave and acceleration (a).

As for the threshold, which is the intensity or magnitude limit for each successive event to occur

represented by Px(l) and Px(h), indicating the lower and higher thresholds for the variable x.

For each event the variable x is computed by summing up the values as shown in these

equations:

Wt=0

W(x)t=1/3 {Px(t)-PxL}/(Pxh-Pxl)

Wx(t)=1/3

Here we have a figure where itÃ¢â‚¬â„¢s an example of vertical components. From top to bottom, an

example of the vertical component of acceleration (black), velocity (blue), and displacement

(orange) signals. The initial peak amplitude parameters Pd, Pv, and Pa are measured as the

absolute maximum of displacement waveform on the early portion of P wave. The threshold

values on each parameter are schematically shown as a black, dashed line on each record.

Examples of Wa, Wv, and Wd as a function of time are also shown with a solid black, blue, and

orange lines, respectively. Each

step on Wa, Wv, and Wd corresponds to an increase of the peak amplitude parameters (as

absolute value) on the ground motion records. Finally, the bottom plot shows the cumulative

logical variable Wt as a function of time (purple curve).

Findings:

From the data collected and the computations done, the primary findings were extracted from the

percentages obtained for each successful alarm triggers. Figure 3 represents the performance of

the method used in this paper. Figures (a, d) are histograms showing the performance of the

system in terms of successful alert represented by (dark green bar), successful no alert

represented by (light green bar), false alert represented by (yellow bar), and missed alert

represented by (red bar) for the two different intensity levels. Figures (b, e) represent alert times

as a function of distance where the black line is the theoretical alert time for the fixed p-wave

time window on-site system (3 s). Finally, figures (c, f ) represent lead times as a function of

distance where the black line is the theoretical lead time for the fixed p-wave time window onsite system while the green line is the best fit regression of the observed lead time as a function

of distance.

The Results from figure 3 show that when all variables are used a significant alarm performance

is achieved. It also shows the ability of the alarm systems to rapidly provide warnings.

Challenges/shortcoming:

One of the challenges in my point of view is estimating or predicting ground shaking levels with

outmost certainty since the paper focuses only on p-wave time window variables for the

computation of intensity while ignoring the S waves variables when computing threshold limits

of the early earthquake warning systems. I believe that if S-waves variables not taken into

account at initial stages especially when larger events take place the alarms might not be able to

detect them.

Conclusion:

In conclusion, we note despite the systems shortcomings in detecting larger events. It also has

some advantages rather than the main one which is detecting possible ground motions within a

given radius of targeted areas. The EEWs can also be manipulated to fit user’s discretion.

Secondly, the proposed methodology can time performance with shorter or longer portions for

warning and declaration when compared to other standard approaches in use.

On-Site Method for Early

Earthquake Warning

Ã¢â‚¬Å“A P-Wave Based SystemsÃ¢â‚¬Â

Introduction

Ã¢â€“Âª In this presentation we will discuss the

importance of EEWs devices ,challenges

and implications of its use.

Ã¢â€“Âª According to Colombeli(2015),Early

earthquake has been a concept that is

being integrated in most countries

nowadays. This mechanics are adopted to

counter the destructive effects of

earthquakes.

Ã¢â€“Âª Research done show that the EEWs

monitoring devices are developed to

detect ongoing earthquake triggers before

hand. After detecting potential earthquake

movement, it sends warnings to target

areas for early mitigation plans.

Ã¢â€“Âª These systems were first developed by the

National Research Institute for Earth

Science and Disaster Prevention in

conjunction with Japan Meteorological

Agency in the year 2007.

1.

Problem:

What problem is discussed in

the paper and why do we care.

Ã¢â€“Âª Defining the effectiveness of

EEWs located in a given site to

detect earthquake waves and

ground motions that are far from

site.

Ã¢â€“Âª Defining thresholds for alarm

warnings.

Data:

2.What data was used in the

study?

Ã¢â€“Âª Data obtained from previous

research events in Japan.

Ã¢â€“Âª 76 events are randomly selected

to form a methodology.

Ã¢â€“Âª 73 events out of this are used to

draw empirical

conclusions,while 3 events are

used to validate the data

obtained.

Figure 1: Events Data Distribution

Methods/Techniques:

3.How were there data analyzed?

Ã¢â€“Âª Quantitative method of data

collection was used to collect data

samples.

Figure 2: Example of vertical

components.

Findings:

Figure 3: Overall performance of the

method used.

4. What were the primary

findings?

Ã¢â€“Âª The primary findings, show that

when all variables are used cocurrently a more reliable and

significant alarm performance is

achieved.

Ã¢â€“Âª It also shows the ability of the

alarm systems to rapidily provide

warnings.

Challenges/shortcoming:

6. Are there potential problems

with this study or are their

assumptions reasonable. As

scientists we are naturally skeptical

of others work. Which parts of this

study are you skeptical about

&why

Ã¢â€“Âª . The challenge of estimating or

predicting ground shaking levels

with outmost certainty.

Ã¢â€“Âª .S waves variables not taken into

account when computing

threshold limits of the EEWs

systems.

Conclusion

Ã¢â€“Âª In conclusion, we note despite the systems shortcomings in detecting larger

events. It also has some advantages rather than the main one which is detecting

possible ground motions within a given radius of targeted areas. The EEWs can

also be manipulated to fit user’s discretion.

Ã¢â€“Âª Secondly, the proposed methodology can time performance with shorter or

longer portions for warning and declaration when compared to other standard

approaches in use.

References

Ã¢â€“Âª Colombelli.S,(2015),A P Wave On-Site Method for Earthquake :AGU

Publications.pg1390-1398.

SEISMIC FACIES ANALYSIS USING MACHINE-LEARNING

T. Wrona1, Indranil Pan2, R.L. Gawthorpe3 and H. Fossen4

Right Running Head: MACHINE-LEARNING BASED FACIES ANALYSIS

1

Department of Earth Science, University of Bergen, AllÃƒÂ©gaten 41, N-5007 Bergen, Norway.

E-mail: thilo.wrona@uib.no

2

Department of Earth Science and Engineering, Imperial College, Prince Consort Road, London,

SW7 2BP, UK. E-mail: indranilpan@gmail.com

3

Department of Earth Science, University of Bergen, AllÃƒÂ©gaten 41, N-5007 Bergen, Norway.

E-mail: rob.gawthorpe@uib.no

4

Department of Earth Science, University of Bergen, AllÃƒÂ©gaten 41, N-5007 Bergen, Norway.

E-mail: haakon.fossen@uib.no

ABSTRACT

Seismic interpretations are, by definition, subjective and often require significant time and

expertise from the interpreter. We demonstrate that machine-learning techniques can help

address these problems by performing seismic facies analyses in a rigorous, repeatable way. For

this purpose, we use state-of-the-art 3D broadband seismic reflection data of the northern North

Sea. Our workflow includes five basic steps. First, we extract seismic attributes to highlight

features in the data. Second, we perform a manual seismic facies classification on 10 000

examples. Third, we use some of these examples to train a range of models to predict seismic

facies. Fourth, we analyze the performance of these models on the remaining examples. Fifth, we

select the Ã¢â‚¬ËœbestÃ¢â‚¬â„¢ model (i.e., highest accuracy) and apply it to a seismic section. As such, we

highlight that machine-learning techniques can increase the efficiency of seismic facies analyses.

INTRODUCTION

Seismic reflection data is a key source of information in numerous fields of geoscience,

including sedimentology and stratigraphy (e.g., Vail, 1987; Posamentier, 2004), structural

geology (Baudon and Cartwright, 2008; Jackson et al., 2014), geomorphology (e.g., Posamentier

and Kolla, 2003; Cartwright and Huuse, 2005; Bull et al., 2009) and volcanology (e.g., Hansen et

al., 2004; Planke et al., 2005; Magee et al., 2013). However, the often subjective and non-unique

interpretation of seismic reflection data has led to longstanding debates based on contrasting

geological interpretations of the same or similar data sets (e.g., Stewart and Allen, 2002;

Underhill, 2004). Moreover, seismic interpretations require significant amounts of time,

experience, and expertise from interpreters (e.g., Bond et al., 2012; Bond, 2015; Macrae et al.,

2016). We believe that machine-learning techniques can help the interpreters reduce some of

these problems associated with seismic facies analyses.

Machine-learning describes a set of computational methods that are able to learn from

data to make accurate predictions. Previous applications of machine-learning to seismic

reflection data focus on the detection of geological structures, such as faults and salt bodies (e.g.,

Hale, 2013; Zhang et al., 2014; Guillen et al., 2015; Araya-Polo et al., 2017; Huang et al., 2017)

and unsupervised seismic facies classification, where an algorithm chooses the number and types

of facies (e.g., ColÃƒÂ©ou et al., 2003; de Matos et al., 2006). While early studies primarily used

clustering algorithms to classify seismic data (e.g., Barnes and Laughlin, 2002; ColÃƒÂ©ou et al.,

2003), recent studies focus on the application of artificial neural networks (e.g., de Matos et al.,

2006; Huang et al., 2017). To demonstrate the strength of these advanced algorithms, this study

compares 20 different classification algorithms (e.g., K-nearest neighbor, Support Vector

Machines and Artificial Neural Networks).

While these unsupervised classification algorithms are, in theory, able to identify the

main seismic facies in a given data set, in practice it can be difficult to correlate these

automatically classified facies to existing geological units. This correlation can be done by

visualizing facies in a lower dimensional space (e.g., Gao, 2007) or self-organized maps (e.g.,

ColÃƒÂ©ou et al., 2003). As an alternative, we introduce a simple supervised machine-learning

workflow, where the user can define the number and type of seismic facies used for

classification. This approach avoids the correlation by allowing the user to adapt the workflow to

a given problem, where seismic facies can be based on existing geological units.

To demonstrate the advantages of this approach, we describe the application of

supervised machine-learning to a seismic facies analysis using 3D broadband seismic reflection

data of the northern North Sea. Our workflow consists of five basic steps. First, we extract

features from the data by calculating 15 seismic attributes. Second, we generate training data by

manually sorting 1000 examples into four facies. Third, we train 20 models to classify seismic

facies using some of these examples. Fourth, we assess the performance of these models using

the remaining examples. Fifth, we select the Ã¢â‚¬ËœbestÃ¢â‚¬â„¢ model based on its performance and apply it

to a seismic section. Our results demonstrate that machine-learning algorithms are able to

perform seismic facies analyses, which are crucial to map sedimentary sequences, structural

elements, and fluid contacts.

3D SEISMIC REFLECTION DATA

Figure 1: Seismic section (courtesy of CGG) used for automated seismic facies analysis.

This study uses state-of-the-art 3D broadband seismic reflection data (CGG

BroadseisTM) of the northern North Sea (Figure 1). The data covers an area of 35,410 km2 and

was acquired using a series of up to 8-km-long streamers towed ~40 m deep. The data recording

extends to 9 s with a time sampling of 4 ms. Broadseis data covers a wide range of frequencies

reaching from 2.5 to 155 Hz (Firth et al., 2014). The binning size was 12.5 Ãƒâ€” 18.75 m. The data

was 3-D true amplitude Kirchhoff pre-stack time migrated. The seismic volume was zero-phase

processed with SEG normal polarity; i.e., a positive reflection (white) corresponds to an

acoustic-impedance increase with depth. The data was time-migrated using a pre-stack

algorithm.

MANUAL INTERPRETATION

Supervised machine-learning requires a subset of the data set for training and testing

models. We therefore select a reasonable number (10 000) of examples to perform a manual

seismic facies classification. This number represents a trade-off between the time required for

manual classification and the achieved model accuracy (>0.95). After testing different sizes

(from 10 Ãƒâ€” 10 to 500 Ãƒâ€” 500 samples), we selected an example size of 100 Ãƒâ€” 100 samples (Figure

2), which results in high model accuracies (>0.95). The classification follows standard schemes

for seismic facies developed based on numerous studies (e.g., Brown, 2004; Bacon et al., 2007;

Kearey et al., 2009). The four facies (i.e., classes) that we use for classification are: A)

continuous, horizontal reflections; B) continuous, dipping reflections; C) discontinuous, crisscrossing reflections; and D) discontinuous, chaotic reflections (Figure 2). These four are

probably the most common basic seismic facies. Since almost all geological structures show at

least one of these facies in seismic reflection data, classifying them accurately would allow us to

map a wide range of structures.

Figure 2: Representative examples of manual seismic facies classification of the four seismic facies chosen

for this study: a) continuous, horizontal reflections; b) continuous, dipping reflections; c) discontinuous,

crisscrossing reflections; and d) discontinuous, chaotic reflections. The horizontal axes show distance in

meters and the vertical axes show two-way traveltime in milliseconds. The number of examples (10 000) is

balanced across classes with each of the four classes containing the same number of examples (2500).

Seismic data courtesy of CGG.

MACHINE-LEARNING

In order to classify seismic facies, we apply a typical machine-learning workflow (e.g.,

Abu-Mostafa et al., 2012). The basic idea of this workflow is to Ã¢â‚¬ËœteachÃ¢â‚¬â„¢ a model to identify

seismic facies in a seismic section. Our workflow includes the following steps: (1) feature

extraction, (2) training, (3) testing, (4) model selection, and (5) application (Figure 3).

Figure 3: Machine-learning workflow in this study.

Feature extraction

Feature extraction aims to obtain as much information as possible about the object of

investigation. For this purpose, we extract so-called features, i.e., properties, which describe the

object we study. Here, this object is a seismic section (Figure 1) and the features are statistical

properties of seismic attributes inside a moving window. Because seismic attributes have been

specifically designed to highlight certain characteristics of seismic reflection data (see Randen

and SÃƒÂ¸nneland, 2005; Chopra and Marfurt, 2007), they are well-suited features.

Seismic attributes

Consistent Dip

Highlights

Reflector dip

Cosine of Phase1

Dominant Frequency1

Envelope1

GLCM(I)2

Structural delineations

Frequency content

Bright spots or strong interfaces

Continuity

Instantaneous Bandwidth1

Instantaneous Frequency1

Instantaneous Phase1

Instantaneous Quality1

Local Flatness1

Frequency range

Hydrocarbons, fractures or interfaces

Continuities, faults, terminations or interfaces

Fluid content or fractures

Channels or fractures

Local Structural Dip3

Dip

Maximum Curvature

Discontinuities or distortions

Reflection Intensity

Second Derivative*

Variance (Edge Method)

Impedance contrasts

Continuity

Faults and fractures

Parameters

Output type: Dip and azimuth

Lateral filter radius: 0

Vertical filter radius: 2

Accuracy: 2

AGC length: 25

Window length: 51

Window length: 51

Window length: 51

Algorithm: Energy

Lower amplitude limit: 0.0

Upper amplitude limit: 1.0

Levels: 5

Split: 4

Lateral radius: 4

Vertical radius: 4

Window length: 51

Window length: 51

Window length: 51

Window length: 51

Orientation sigma X-Z: 2.0

Variance sigma X-Z: 2.5

Principal component

Sigma X-Z: 1.5

Vertical radius: 12

Inline/xline radius: 1

Inline range: 5

Crossline range: 5

Vertical smoothing:10

Dip correction: On

Inline scale: 1.5

Crossline scale: 1.5

Vertical scale: 1.5

Plane confidence threshold: 0.6

Dip guided smoothing: On

Table 1: Extracted seismic attributes. References: 1Taner and Sheriff, (1977) and Taner et al., (1979);

2

Haralick et al., (1973), Reed and Hussong, (1989) and Gao, (2003) and 3Randen et al., (2000). *The

Second Derivative attribute was calculated from the original seismic data. These attributes were selected

because they provide sufficient information on different geological and geophysical characteristics of the

data to allow accurate seismic facies predictions.

Figure 4: Calculated seismic attributes sorted in rows according to Table 1 starting with the original seismic section in the top left corner. Seismic

data courtesy of CGG.

After examining all seismic attributes available in Schlumberger Petrel 2015Ã‚Â©, we extract 15

attributes, which allow accurate seismic facies predictions (see Table 1, Figure 4). Seismicattribute extraction typically involves non-linear transformations (e.g., Hilbert transformation) of

the original seismic data. As such, we can describe these calculations by:

.

where

is the original data;

are the transformations, and

(1)

are the resulting seismic

attributes, which were normalized.

Although this process provides a value at each point of the data, the nature of the seismic

data requires an additional processing step. The seismic data (and therefore the seismic

attributes) contain numerous small-scale variations, which only in combination form a seismic

facies. This phenomenon is captured by calculating a series of statistics inside a moving window

(100 Ãƒâ€” 100 samples) from these attributes. These statistics are the features that we use for

machine-learning. Mathematically, we can describe this process as a deconstruction of the

seismic attribute matrices ( ) into a large number of matrices (

) for each window:

(2)

In each window, we calculate a series of statistics, i.e., the features (

):

.

(3)

The statistics, we use, include: (1) the 20th percentile; (2) the 80th percentile; (3) the mean; (4) the

standard deviation; (5) standard error of the mean; (6) the skewness; and (7) the kurtosis.

Regularization

Figure 5: Hyperparameters selected based on a tradeoff between model accuracy and simplicity for different

algorithms during training. Grey areas indicate standard

deviation of accuracy between different folds (i.e., splits)

during the cross-validation

Using a large number of features can result in overfitting where an overly complex model

describes random errors or noise in the data. To avoid overfitting, we regularize our models,

when possible, during training. Training during machine-learning usually involves the

minimization of the in-sample error, i.e., the difference between the predicted (

) and the

actual ( ) result:

.

(4)

Regularization introduces an additional constraint on the set of models:

.

(5)

where

is the regularization parameter and

the penalty function. The regularization

parameter was selected based on a trade-off between model accuracy and simplicity during

training (see Figure 5). While we conduct no explicit feature selection, regularization can be

regarded as an implicit method to constrain features.

Training

In this phase, we train several models to classify seismic facies using training data of our

manual interpretation. Training itself involves the minimization of the in-sample error, i.e., the

difference between the predicted (

) and the known result ( ) (see Eq. 4). Because we

distinguish between four seismic facies, we conduct a multi-class classification where the model

output comprises four discrete classes (A, B, C, and D). While some classifiers can inherently

handle multi-class problems, binary classifiers require one-vs-all or one-vs-one strategies to

predict more than two classes. By covering the most-common algorithms used for multi-class

classification (see Table 2), we are able to compare their performance on this data set (Figure 5).

To improve the performance, we explore different kernels for some of the algorithms (see

Table 2). Classification problems often become easier when we transform a feature (

a high-dimensional space (

). Explicit feature transformations (

) into

) can,

however, be computationally expensive. Kernel functions ( ) allow an implicit use of these

high-dimensional spaces by calculating inner products between feature pair images:

.

(6)

Default settings

Algorithm

Strategy

Adaboost

multi-class

Decision Tree

multi-class

Extra Trees

Kernel

Hyperparameters

Others

Name

Range

Selection

base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm=Ã¢â‚¬â„¢S

AMME.RÃ¢â‚¬â„¢, random_state=None

Max. depth

[1, 10]

3

–

criterion=Ã¢â‚¬â„¢giniÃ¢â‚¬â„¢, splitter=Ã¢â‚¬â„¢bestÃ¢â‚¬â„¢, max_depth=None, min_samples_split=2,

min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=Non

e, random_state=None, max_leaf_nodes=None, min_impurity_decrease

=0.0, min_impurity_split=None, class_weight=None, presort=False

Max. depth

[1, 10]

7

multiclass

–

n_estimators=10, criterion=Ã¢â‚¬â„¢giniÃ¢â‚¬â„¢, max_depth=None, min_samples_split=

2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=Ã¢â‚¬â„¢a

utoÃ¢â‚¬â„¢, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_

split=None, bootstrap=False, oob_score=False, n_jobs=1, random_state

=None, verbose=0, warm_start=False, class_weight=None

Estimators

[1, 20]

3

Gaussian Process

(Cubic Kernel)

one-vs-all

Cubic

Polynomial

alpha=1e10, optimizer=Ã¢â‚¬â„¢fmin_l_bfgs_bÃ¢â‚¬â„¢, n_restarts_optimizer=0, normalize_y=Fals

e, copy_X_train=True, random_state=None

–

–

–

Gaussian Process

(Radial Basis Function)

one-vs-all

Radial Basis

Function

alpha=1e10, optimizer=Ã¢â‚¬â„¢fmin_l_bfgs_bÃ¢â‚¬â„¢, n_restarts_optimizer=0, normalize_y=Fals

e, copy_X_train=True, random_state=None

–

–

–

Gradient Boosting

multi-class

–

loss=Ã¢â‚¬â„¢devianceÃ¢â‚¬â„¢, learning_rate=0.1, n_estimators=100, subsample=1.0, c

riterion=Ã¢â‚¬â„¢friedman_mseÃ¢â‚¬â„¢, min_samples_split=2, min_samples_leaf=1, min

_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0,

min_impurity_split=None, init=None, random_state=None, max_features

=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=Ã¢â‚¬â„¢

autoÃ¢â‚¬â„¢

Estimators

[1, 40]

4

K-nearest Neighbor

one-vs-all

–

radius=1.0, algorithm=Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, leaf_size=30, metric=Ã¢â‚¬â„¢minkowskiÃ¢â‚¬â„¢, p=2, metr

ic_params=None, n_jobs=1, **kwargs

Neighbors

[1, 20]

4

Linear Discriminant

Analysis

multi-class

–

solver=Ã¢â‚¬â„¢svdÃ¢â‚¬â„¢, shrinkage=None, priors=None, n_components=None, store

_covariance=False, tol=0.0001

–

–

Logistical Regression

one-vs-all

–

penalty=Ã¢â‚¬â„¢l2Ã¢â‚¬â„¢, dual=False, tol=0.0001, fit_intercept=True, intercept_scaling

=1, class_weight=None, random_state=None, solver=Ã¢â‚¬â„¢liblinearÃ¢â‚¬â„¢, max_iter

=100, multi_class=Ã¢â‚¬â„¢ovrÃ¢â‚¬â„¢, verbose=0, warm_start=False, n_jobs=1

Regularization

[10e-5, 10e5]

1

Neural Network

(Identity)

one-vs-all

–

hidden_layer_sizes=(100, ), activation=Ã¢â‚¬â„¢reluÃ¢â‚¬â„¢, solver=Ã¢â‚¬â„¢adamÃ¢â‚¬â„¢, batch_size=

Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, learning_rate=Ã¢â‚¬â„¢constantÃ¢â‚¬â„¢, learning_rate_init=0.001, power_t=0.5,

max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=

False, warm_start=False, momentum=0.9, nesterovs_momentum=True,

early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99

9, epsilon=1e-08

Regularization

[10e-5, 10e5]

1

–

Neural Network

(Logistic)

one-vs-all

–

hidden_layer_sizes=(100, ), activation=Ã¢â‚¬â„¢reluÃ¢â‚¬â„¢, solver=Ã¢â‚¬â„¢adamÃ¢â‚¬â„¢, batch_size=

Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, learning_rate=Ã¢â‚¬â„¢constantÃ¢â‚¬â„¢, learning_rate_init=0.001, power_t=0.5,

max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=

False, warm_start=False, momentum=0.9, nesterovs_momentum=True,

early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99

9, epsilon=1e-08

Regularization

[10e-5, 10e5]

0.1

Neural Network

(Rectified Linear Unit)

one-vs-all

–

hidden_layer_sizes=(100, ), activation=Ã¢â‚¬â„¢reluÃ¢â‚¬â„¢, solver=Ã¢â‚¬â„¢adamÃ¢â‚¬â„¢, batch_size=

Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, learning_rate=Ã¢â‚¬â„¢constantÃ¢â‚¬â„¢, learning_rate_init=0.001, power_t=0.5,

max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=

False, warm_start=False, momentum=0.9, nesterovs_momentum=True,

early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99

9, epsilon=1e-08

Regularization

[10e-5, 10e5]

1

Neural Network

(Hyperbolic Tangent)

one-vs-all

–

hidden_layer_sizes=(100, ), activation=Ã¢â‚¬â„¢reluÃ¢â‚¬â„¢, solver=Ã¢â‚¬â„¢adamÃ¢â‚¬â„¢, batch_size=

Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, learning_rate=Ã¢â‚¬â„¢constantÃ¢â‚¬â„¢, learning_rate_init=0.001, power_t=0.5,

max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=

False, warm_start=False, momentum=0.9, nesterovs_momentum=True,

early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.99

9, epsilon=1e-08

Regularization

[10e-5, 10e5]

1

Quadratic Discriminant

Analysis

one-vs-all

–

priors=None, store_covariance=False, tol=0.0001, store_covariances=N

one

Regularization

[-2, 2]

0

Random Forest

multi-class

–

criterion=Ã¢â‚¬â„¢giniÃ¢â‚¬â„¢, max_depth=None, min_samples_split=2, min_samples_l

eaf=1, min_weight_fraction_leaf=0.0, max_features=Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, max_leaf_no

des=None, min_impurity_decrease=0.0, min_impurity_split=None, bootst

rap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0,

warm_start=False, class_weight=None

Estimators

[1, 50]

3

Stochastic Gradient

Descent

one-vs-all

–

loss=Ã¢â‚¬â„¢hingeÃ¢â‚¬â„¢, penalty=Ã¢â‚¬â„¢l2Ã¢â‚¬â„¢, l1_ratio=0.15, fit_intercept=True, max_iter=No

ne, tol=None, shuffle=True, verbose=0, epsilon=0.1, n_jobs=1, random_

state=None, learning_rate=Ã¢â‚¬â„¢optimalÃ¢â‚¬â„¢, eta0=0.0, power_t=0.5, class_weig

ht=None, warm_start=False, average=False, n_iter=None

Regularization

[10e-5, 10e5]

10e-4

Support Vector Machine

(Cubic)

one-vsone

Cubic

Polynomial

degree=3, gamma=Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, coef0=0.0, shrinking=True, probability=False, t

ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite

r=-1, decision_function_shape=Ã¢â‚¬â„¢ovrÃ¢â‚¬â„¢, random_state=None

Regularization

[10e-5, 10e5]

10e-5

Support Vector Machine

(Linear)

one-vsone

Linear

Function

degree=3, gamma=Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, coef0=0.0, shrinking=True, probability=False, t

ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite

r=-1, decision_function_shape=Ã¢â‚¬â„¢ovrÃ¢â‚¬â„¢, random_state=None

Regularization

[10e-5, 10e5]

1

Support Vector Machine

(Quadratic)

one-vsone

Quadratic

Polynomial

degree=3, gamma=Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, coef0=0.0, shrinking=True, probability=False, t

ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite

r=-1, decision_function_shape=Ã¢â‚¬â„¢ovrÃ¢â‚¬â„¢, random_state=None

Regularization

[10e-5, 10e5]

10e-4

Support Vector Machine

(Radial Basis Function)

one-vsone

Radial Basis

Function

degree=3, gamma=Ã¢â‚¬â„¢autoÃ¢â‚¬â„¢, coef0=0.0, shrinking=True, probability=False, t

ol=0.001, cache_size=200, class_weight=None, verbose=False, max_ite

r=-1, decision_function_shape=Ã¢â‚¬â„¢ovrÃ¢â‚¬â„¢, random_state=None

Regularization

[10e-5, 10e5]

1

Table 2: Algorithms, default- and hyperparameters in this study. The selected parameters were selected

based on a trade-off between model accuracy and simplicity (see Figure 5). For more information, we

would like to refer the reader to the scikit-learn package (Pedregosa et al., 2011).

As such, kernels allow us to use high-dimensional feature spaces without specifying them or the

explicit transformation. Here, we use polynomial (

) and radial basis kernel functions (

):

(7)

(8)

in combination with Support Vector Machine, Gaussian Process, and Neural Network classifiers

(see Table 2).

Cross-validation

During validation, we determine the model performances on yet unseen data. A simple

holdout validation splits the data ( ) in two subsets: one for training (

(

) and one for testing

). This approach, however, leads to a dilemma, as we would like to maximize both subsets:

the training-set to generate well-constrained models, and the test-set to obtain reliable estimates

of model performance. This dilemma is resolved by cross-validation, i.e., splitting data multiple

times and averaging performance estimates between folds. We apply a ten-fold stratified crossvalidation where data is split into training (90% of the data) and test set (10% of the data) ten

times while preserving the percentage of examples of each class. To visualize model

performances, we calculate an average confusion matrix of each model (Figure 6).

Figure 6: Confusion matrices of trained models showing average number of correct classifications on

diagonal and average number of incorrect classifications off diagonal. Averages result from ten-fold crossvalidation. Note that the classes were balanced, so that the confusion matrices visualize class-wise model

accuracies.

#

Model

Facies A

Facies B

Facies C

Facies D

Overall

Pre

Rec

f1

Pre

Rec

f1

Pre

Rec

f1

Pre

Rec

f1

Pre

Rec

f1

Acc

1

Support Vector Machine (Cubic)

0.995

0.985

0.990

0.975

0.992

0.983

0.992

0.974

0.983

0.971

0.982

0.977

0.983

0.983

0.983

0.983Ã‚Â±0.004

2

Support Vector Machine

(Radial Basis Function)

0.995

0.984

0.989

0.974

0.994

0.984

0.991

0.973

0.982

0.973

0.980

0.977

0.983

0.983

0.983

0.983Ã‚Â±0.004

3

Support Vector Machine (Quadratic)

0.993

0.984

0.989

0.974

0.992

0.983

0.992

0.972

0.982

0.971

0.982

0.976

0.983

0.982

0.982

0.983Ã‚Â±0.004

4

Support Vector Machine (Linear)

0.993

0.984

0.988

0.973

0.988

0.980

0.991

0.975

0.983

0.970

0.979

0.974

0.982

0.981

0.981

0.982Ã‚Â±0.004

5

K-nearest Neighbor

0.991

0.987

0.989

0.979

0.988

0.983

0.985

0.974

0.979

0.972

0.977

0.974

0.982

0.981

0.981

0.982Ã‚Â±0.003

6

Neural Network (Rectified Linear Unit)

0.994

0.982

0.988

0.966

0.992

0.979

0.988

0.976

0.982

0.975

0.970

0.972

0.980

0.980

0.980

0.980Ã‚Â±0.005

7

Neural Network (Logistic Activation Function)

0.990

0.984

0.987

0.972

0.982

0.977

0.988

0.975

0.982

0.968

0.975

0.971

0.980

0.979

0.979

0.980Ã‚Â±0.004

8

Extra Tree

0.986

0.985

0.986

0.975

0.982

0.979

0.981

0.974

0.977

0.969

0.971

0.970

0.978

0.978

0.978

0.978Ã‚Â±0.004

9

Random Forest

0.987

0.983

0.985

0.974

0.982

0.978

0.982

0.974

0.977

0.969

0.972

0.971

0.978

0.978

0.978

0.978Ã‚Â±0.004

10

Neural Network

(Hyperbolic Tangent Activation Function)

0.990

0.984

0.987

0.969

0.983

0.976

0.983

0.976

0.979

0.970

0.967

0.968

0.978

0.977

0.977

0.978Ã‚Â±0.003

11

Gradient Boosting

0.986

0.983

0.984

0.973

0.979

0.976

0.980

0.975

0.977

0.969

0.969

0.969

0.977

0.977

0.977

0.977Ã‚Â±0.003

12

Logistic Regression

0.984

0.980

0.982

0.968

0.977

0.973

0.982

0.976

0.979

0.970

0.969

0.969

0.976

0.976

0.976

0.976Ã‚Â±0.003

13

Quadratic Discriminant Analysis

0.984

0.985

0.985

0.970

0.983

0.976

0.947

0.987

0.966

0.985

0.928

0.956

0.971

0.971

0.971

0.971Ã‚Â±0.004

14

Decision Tree

0.986

0.976

0.981

0.963

0.981

0.972

0.978

0.962

0.970

0.957

0.965

0.961

0.971

0.971

0.971

0.971Ã‚Â±0.003

15

Neural Network (Identity Activation Function)

0.971

0.973

0.972

0.957

0.962

0.959

0.983

0.976

0.979

0.968

0.966

0.967

0.970

0.969

0.969

0.970Ã‚Â±0.005

16

Adaboost

0.977

0.972

0.974

0.966

0.972

0.968

0.972

0.964

0.968

0.961

0.964

0.962

0.969

0.968

0.968

0.969Ã‚Â±0.011

17

Linear Discriminant Analysis

0.980

0.968

0.974

0.951

0.975

0.963

0.983

0.958

0.970

0.954

0.964

0.959

0.967

0.967

0.967

0.967Ã‚Â±0.004

18

Gaussian Process (Cubic)

0.973

0.972

0.972

0.965

0.956

0.960

0.973

0.964

0.968

0.943

0.956

0.948

0.964

0.962

0.962

0.964Ã‚Â±0.004

19

Stochastic Gradient Descent

0.957

0.974

0.965

0.957

0.940

0.948

0.993

0.945

0.969

0.934

0.977

0.955

0.960

0.959

0.959

0.960Ã‚Â±0.004

20

Gaussian Process (Radial Basis Function)

0.762

0.504

0.525

0.587

0.724

0.628

0.889

0.752

0.789

0.701

0.660

0.647

0.735

0.660

0.647

0.735Ã‚Â±0.127

Table 3: Parameters describing model performance on each class (i.e., facies) and overall, i.e., precision, recall, f1-score, and support as well as

accuracy. Models are sorted by accuracy. Standard deviations of model accuracies between different folds of cross-validation are listed in the last

column. Note that models are balanced with a support of 2500 for each class.

To quantify the model performance, we calculate: (1) precision, (2) recall and (3) f1score, and their averages for each model (see Table 3). Precision describes the ability of

classifiers to predict classes correctly. Recall (or sensitivity) describes the ability of classifiers to

find all examples of a class. F1-score is an equally weighted harmonic mean of precision and

recall, and support is simply the number of examples of each class. Furthermore, we calculate

the average accuracy of each model and determine its standard deviation between folds (see

Table 3). Note that regularization, training, and cross-validation were implemented in Python

using the scikit-learn package (Pedregosa et al., 2011).

Model selection

Model selection is based on generalization performance of trained models on test data. In

our case, the model using a support vector machine with a cubic kernel function shows the

highest accuracy, precision, and recall out of all models (see Table 3, Figure 7). This means that

this model does not only classify seismic facies most accurately, but it is also the best at avoiding

incorrect classifications (Figure 6).

Application

After the model selection, it is recommended to train the best model again using the

entire data set available, i.e., training plus test data (Abu-Mostafa et al., 2012). This final model

is subsequently applied to the entire seismic section (Figure 7).

Figure 7: a) Seismic section and b) final classification result. Seismic data courtesy of CGG.

CLASSIFICATION RESULTS

Test

Test results, obtained during cross-validation, include confusion matrices and descriptive

metrics. Confusion matrices visualize the precision of models for each class (Figure 6). When

the model predicts the correct class, the sample contributes to the diagonal of the confusion

matrix. When the model predicts the wrong class, the sample contributes to the cells off

diagonal. The first element of the confusion matrix (top left) shows the precision for the first

class (i.e., horizontal); the second element (first row, second column) shows the percentage of

samples classified into the second class (i.e., dipping) despite belonging to the first class (i.e.,

horizontal) and so on. As such, confusion matrices show how well each model predicts each

class. In general, the observed variations between models and classes are minor (

Purchase answer to see full

attachment