Nonlinear full waveform inversion of wide-aperture OBS data for Moho structure using a trans-dimensional Bayesian method

before inversion, trans-dimensional MCMC allows the ﬂexibility for an automatic estimation of both the model complexity (e.g. the number of velocity interfaces) and the velocity–depth structure from the data. We ﬁrst test the algorithm on synthetic data using four representative Moho models and then apply to an ocean bottom seismometer (OBS) data from the Mid-Atlantic Ocean. A 2-D ﬁnite-difference solution of an acoustic wave equation is used for data simulation at each iteration of MCMC search, for taking into account the lateral heterogeneities in the upper crust, which is constrained from traveltime tomography and is kept unchanged during inversion; the 1-D model parametrization near Moho enables an efﬁcient search of the trans-dimensional model space. Inversion results indicate that, with very little prior and the wide-aperture seismograms, the trans-dimensional FWI method is able to infer the posterior distribution of both the number of velocity interfaces and the velocity–depth model for a strong nonlinear problem, making the inversion a complete data-driven process. The distribution of interface matches the velocity discontinuities. We ﬁnd that the Moho in the study area is a transition zone of 0.7 km, or a sharp boundary with velocities from around 7 km s − 1 in the lower crust to 8 km s − 1 of the upper mantle; both provide nearly identical waveform match for the ﬁeld data. The ambiguity comes from the resolution limit of the band-limited seismic data and limited offset range for PmP arrivals.


I N T RO D U C T I O N
Seismic full waveform inversion (FWI) is a state-of-the-art method for estimating high-resolution quantitative subsurface images by fully exploiting the recorded waveform (Tarantola 1984;Pratt et al. 1998;Shipp & Singh 2002;Fichtner et al. 2006;Virieux & Operto 2009;Ray et al. 2017).It is based on minimizing the difference between observed and synthetic data, with a numerical solution of the complete wave equation for realistic simulation of seismic wave propagation.A gradient-based linearized iterative inversion approach is widely used, where the adjoint method is applied to calculate the gradients of the data misfit by cross-correlation of the forward and adjoint wavefields (Tromp et al. 2005;Fichtner et al. 2006;Plessix 2006).The linearized approach has been used for first arrival turning rays (refraction) and pre-critical reflection data (Shipp & Singh 2002;Zelt et al. 2003;Vigh et al. 2014).However, with limited data coverage and frequency bandwidth, FWI is a highly nonlinear and non-unique problem (Bozdag et al. 2011).Previous studies suggest that the final outcome from FWI under linear assumption highly depends on the starting model, and the iterative process can get trapped easily in a local minima (Bozdag et al. 2011;Métivier et al. 2016).
In the presence of a high-velocity contrast, such as at Moho, the reflection coefficient and recorded waveforms from wide-aperture seismic acquisition are extremely nonlinear.The elastic reflection coefficient becomes complex starting from the critical angle, with dramatic changes in both the amplitude and phase of reflected waves at near-and post-critical incident angles, which do not lend themselves to linearization (Aki & Richards 1980).Significant changes in the waveform shape can occur at critical angles, where the waveform of a reflection will be the corresponding Hilbert-transform of the incident wave.The problem at the Moho is further complicated by the interference of lower crustal (Pg) and upper mantle (Pn) turning ray arrivals with the critically reflected Moho arrivals (PmP; McGlashan et al. 2008;Buehler & Shearer 2017).In order to determine velocity structure near Moho, a nonlinear method should be used.
Bayesian inference using Markov chain Monte Carlo (MCMC) is a fully nonlinear inversion method based on an extensive search of the model space, which is able to remove the dependence on the starting model, and also provide uncertainty analysis for the inversion results.The feasibility of seismic inversion using MCMC sampling has been demonstrated by several authors (Cary & Chapman 1988;Mosegaard & Tarantola 1995;Sambridge & Mosegaard 2002;Fichtner & Zunino 2019), with a recently growing interest of using advanced MCMC methods for applications in geoscience, including, for example, trans-dimensional (reversible-jump) MCMC (Green 1995;Malinverno 2002;Bodin & Sambridge 2009;Sambridge et al. 2013;Hawkins & Sambridge 2015;Saygin et al. 2015;Dettmer et al. 2016;Galetti et al. 2017;Ray et al. 2017;Guo et al. 2020) and Hamiltonian MCMC (Sen & Biswas 2017;Fichtner et al. 2018;Gebraad et al. 2020).
With regard to velocity model building near Moho using Bayesian methods, Cary & Chapman (1988) used MCMC sampling for inverting Moho models and associated error analysis from seismic refraction waveforms, where the parameters being estimated were the depths of a number of fixed velocities.The fixed velocities and the number of layers limit the flexibility of the inversion algorithm, making the search space affected by subjective human intervention.Mosegaard et al. (1997) applied the MCMC technique for Moho structure inversion using near-normal-incidence reflections, where the problem is only moderately nonlinear.Besides MCMC sampling, other nonlinear global optimization methods have also been used for addressing the strong nonlinearity observed in the Moho model inversion.Sambridge & Drijkoningen (1992) applied a genetic algorithm for 1-D Moho model inversion from marine refraction data, and showed better convergence rate than the classic MCMC method.
Inversion methods require solving the forward modelling problem for computing the predicted seismic data, for a given earth model at each iteration.Nonlinear inversion using MCMC sampling usually requires a large number of iterations for convergence.Limited by computational resources, approximations for wave equations are usually used in previous studies for nonlinear seismic inversion, especially when targeted for deep earth model building such as Moho.Mosegaard et al. (1997) used a propagator matrix method (Ganley 1981) for predicting seismograms at each iteration of MCMC sampling.The WBKJ method (Chapman 1978;Chapman & Orcutt 1985) was used for efficiently calculating synthetic seismograms in both Cary & Chapman (1988) and Sambridge & Drijkoningen (1992).Cary & Chapman (1988) compared the accuracy of the WKBJ method with synthetic seismograms calculated from the reflectivity method, and indicated that the it was accurate enough for the data used in their study.The solution from WKBJ is, nonetheless, still an approximation to the full solution of wave equation, therefore inaccuracy may appear in the presence of strong velocity variations (Shaw 1986), such as the Moho boundary, which causes the systematically smoothed models of high-velocity gradient from inversion as discussed in Cary & Chapman (1988).Therefore, a more accurate solution for forward modelling should be used for the nonlinear Moho FWI problem, when the capability of high-performance computing (HPC) allows.
The fast development of HPC capabilities enables the emergence and accelerated application of advanced model space search algorithms for inversion, and using more accurate forward modelling methods for better description of the wave physics.In the present study, we use a trans-dimensional MCMC method for inferring the posterior distribution of earth models near Moho, which is idealized with a 1-D model assumption for simplicity and illustration, from mainly PmP phase around critical angles in the wide-aperture seismic data that also contain Pg and Pn arrivals.Besides the velocitydepth structure of the earth model, model dimension (the number of velocity interfaces) is also a variable to be inferred from data, which makes the outcome from trans-dimensional FWI a more complete solution for the inversion problem.At each iteration of the MCMC chain, a time-domain finite-difference solution of 2-D wave equation is used for predicting seismograms, for taking into account the relatively strong horizontal heterogeneities in the upper crust.We provide synthetic studies using four representative Moho structure models, followed by a field data study using a long-offset (approximately 40 km) OBS seismic data from the Mid-Atlantic Ocean for inverting Moho structures for crust formed at slow spreading Mid-Atlantic Ridge (MAR).

M E T H O D O L O G Y
In this section, we formulate the FWI problem in a Bayesian framework, where all the information involved is represented using probabilistic terms (Gouveia & Scales 1998).In contrast to obtaining a single final physical model of the subsurface using common linearized inversion methods, from the observed seismic data d, we aim to estimate the posterior distribution of the seismic velocity model m, from which we can derive a probabilistic description of 1058 P. Guo et al. the earth model, including for example, the 'optimal' solution and uncertainty quantification.
Based on the Bayes' theorem, the posterior distribution can be defined as where p(m) is the prior distribution of the model parameters, which represents the knowledge about the model before measuring the data.p(d|m) is the likelihood function, which is the probability of observed data d with simulated data given model m.Thus, the posterior distribution reflects how the prior knowledge about the model parameters is updated by the observed data, which, in the context of FWI, is the recorded waveform information.p(d) is a normalizing constant also known as evidence, that makes the integral of the posterior distribution equal to unity.The computation of p(d) is usually prohibitive, as it requires integration over the model space.
The most common technique for Bayesian inference is probably MCMC.Because it only requires computation of the probability ratio for pairs of models, and p(d) is independent of a specific model, we can neglect the evidence term and replace eq. ( 1) with p(m|d) ∝ p(d|m) p(m). (2)

Model parametrization
MCMC requires a large number of iterations for sampling the posterior distribution, and the required iteration can increase dramatically in an exponential scale with the increasing number of parameters [model dimension; the curse of dimensionality (Sambridge et al. 2006)].Seismic inversion is typically formulated as a highdimensional problem, therefore a sparse model parametrization is required for an efficient search of the model space.
The velocity structure near Moho (the lower crust, Moho and the upper mantle) can be approximately considered as horizontally invariant within the offset range of a common OBS gather for OBS data.Therefore we use a 1-D model assumption defined by a number of velocity nodes (interface discontinuities); these velocity nodes are used for interpolating velocity values at depths between two neighbouring nodes.The model in depth between two neighbouring nodes can be considered as a layer.
We use cubic spline interpolation for deriving 1-D velocity models from the sparsely distributed velocity nodes for simplicity (Fig. 1a).Cubic spline interpolation is able to describe smoothing velocity transitions, as is the case for many models of the subsurface, but may have difficulties when there is an abrupt (e.g. a step function) velocity change, such as at Moho.Thus we introduce a second model parametrization, that the cubic spline interpolation is applied for the shallower part, and from certain depth (9 km in this study) of the lower crust to the upper mantle, a step function is used for interpolating velocities between two neighbouring velocity nodes.With a step function, the velocities between two neighbouring velocity nodes are equal to the value of the node at greater depth (Fig. 1b).We refer to the second approach for defining velocity models as a 'hybrid' (cubic spline interpolation for shallower part and a step function for greater depth) model parametrization.
Besides inverting for the properties (including depth and velocity) of velocity nodes from the observed seismic data, the transdimensional MCMC method is also able to infer the number of velocity nodes (model dimensions) required for describing the velocity structure complexity, as indicated by the name 'trans-dimensional'.
We will introduce the specific Bayesian method in the following section.

Trans-dimensional MCMC
Given observed data, MCMC sampling is an iterative process to generate model samples from a target distribution, where each of the models is a perturbation of the previous one.We use transdimensional MCMC sampling to explore the model space (Green 1995;Malinverno 2002;Agostinetti & Malinverno 2010;Burdick & Lekić 2017;Killingbeck et al. 2018).The attractive property of this particular sampling method is that, besides physical properties of velocity models (velocity-depth structure for 1-D model), the model dimension (the number of velocity nodes for our case) is also a variable to be inferred from data; thus the algorithm greatly simplifies the preparation work for determining such parameters (Ray et al. 2016;Dadi et al. 2018;Hawkins et al. 2019).Transdimensional MCMC can be considered as an extension of the popular Metropolis-Hastings algorithm, with proposal functions that can increase or decrease model dimension (Bodin et al. 2012).Compared with conventional MCMC method for seismic inversion that requires determining the number of unknowns (e.g. the number of layers for 1-D model) before inversion starts (Mosegaard et al. 1997), trans-dimensional MCMC provides a more complete data-driven solution for the FWI problem, with outcomes including posterior distribution for both the model complexity and physical properties.
In the context of 1-D velocity model inversion, we list the four general types of model update proposals for creating new models in the implementation of the trans-dimensional MCMC as (i) Birth: a new velocity node is inserted at a random depth (increasing of model dimension), and therefore a randomly selected layer is divided into two.
(ii) Death: a randomly selected velocity node is removed (decreasing of model dimension), and therefore two neighbouring layers are merged together.
(iii) Node depth: depth of a randomly selected velocity node is adjusted to shallower or deeper depth, using perturbation values from a Gaussian distribution.
(iv) Seismic velocity: the velocity of a randomly selected velocity node is adjusted, with perturbation values from a Gaussian distribution.
The sampled model at each iteration of MCMC sampling is a series of velocity nodes with different velocity values, which are distributed at different depths.As defined before, a node represents a velocity discontinuity, and the velocity between two neighbouring nodes can be defined using interpolation; the 1-D model between two neighbouring nodes can be viewed as a layer.The interpolated velocity-depth model from the sparsely distributed velocity nodes is used for seismic modelling for predicting data.
For trans-dimensional FWI, the starting velocity model of a chain is selected randomly from the prior distribution.At each iteration of an MCMC chain, a new model is proposed by perturbing the current state of model parameters, using a proposal function listed above.The new model will either be rejected, or accepted as the latest member of the Markov chain depending on the acceptance criterion ratio (Green 1995;Ray et al. 2017;Hawkins et al. 2019)  as where the term p(m ) p(m) is the prior ratio, p(d|m ) p(d|m) the likelihood ratio and Q(m →m) Q(m→m ) the proposal ratio.The proposal ratio measures the probability that perturbs the current model m to obtain the new proposed model m , calculated by the posterior distribution evaluated at m to m multiplied by the ratio for the reverse step from m to m (Sambridge et al. 2006;Dettmer et al. 2010;Zhu & Gibson 2018).The difference in the acceptance ratio with the common Metropolis-Hastings algorithm is the extra term |J|, the determinant of the Jacobian when transformation from m to m , which is required to preserve volume when the transformation involves a jump between dimensions (Green 2003;Bodin & Sambridge 2009).When a Markov chain is completed, the first part of the chain, that is, the 'burn-in' stage, will be discarded; after that, the chain is assumed to be in the stationary phase, where the model samples will be considered from the posterior distribution constrained by the prior and data.The model ensemble from MCMC FWI contains velocity models from the posterior distribution, from which the probabilistic properties for the inversion problem can be derived.

Forward problem
Inversion methods require solving the forward modelling problem of computing the predicted data for an earth model at each iteration.The acoustic assumption has been widely applied for simulating seismic wave propagation in FWI problems (Tarantola 1984;Virieux & Operto 2009;Kamei et al. 2013;Ray et al. 2017).We use the first-order derivative formulation with pressure and particle velocities (Auld 1973) as where p is the pressure wavefield, v is the particle velocity vector, ρ is density, and c p is the P wave velocity.In this study, the time-domain staggered-grid finite-difference (FD; Virieux 1986) is used for solving the acoustic wave equations for predicting seismic data given a model, with eighth-order accuracy in space, and second-order accuracy in time.The forward modelling solver must be numerically stable for all the models allowed by the prior, therefore a conservative choice of temporal and spatial size is chosen for satisfying the stability condition of finite-difference simulation.
The convolutional perfectly matched layer (CPML) (Martin & Komatitsch 2009) is used as the absorbing boundary condition.Note that while 1-D modelling is much more computationally efficient, here we use 2-D FD modelling for a better description of the wave physics in the oceanic crust with pronounced horizontal heterogeneities.

Likelihood function
The likelihood function is based on a least-squares measure of the data misfit, defined as r = d − f (m), where d is the observed prestack seismic shot gathers, and f (m) is the simulated seismic data with model m.The data errors are approximated by a covariance matrix C. The Gaussian likelihood function is where Ns is the number of samples in the seismic data.The covariance matrix C is usually assumed to be diagonal for seismic FWI.However, the noise is temporally correlated in seismic records, thus C is not diagonal (Sambridge 1999 P. Guo et al. 2010).We use a 1-D Gaussian process for describing the temporally correlated noise, where the element of C is parametrized with a stationary Gaussian variogram (Bodin et al. 2012;Visser et al. 2019).

The prior
The prior contains information we know about the model parameters before measuring data, and defines the model space to be searched by Markov chains; for seismic inversion, it is usually determined with information from previous work, and knowledge of the expected subsurface physical parameter ranges.
In this study, we use prior distributions controlled by mean and standard deviation parameters, which are able to provide flexibility for defining distributions over models, while at the same time keeping the number of required prior parameters small (Visser et al. 2019).
We use independent Gamma distributions as priors over each node's velocity v i , with the probability for v defined as the mean is defined μ = α/β and a standard deviation σ = μ/ √ α; the Gamma distribution has the benefit of allowing only real positive numbers.Additional constraints, including the minimum and maximum velocities, are used to avoid sampling unrealistic velocity values.
The vector of layer width w (the distance in depth between two neighbouring velocity nodes) has a symmetric Dirichlet prior distribution, where w is the distance in depth between two neighbouring velocity nodes (the layer width); the vector of layer width contains k values, with their sum equal to the maximum model depth L for inversion.B(ν, k) is a Beta function serving as a normalizing constant.ν is a parameter defining the shape of the prior density; we set ν = 2 for simplicity.
A Poisson distribution is used as the prior for the model dimension k (the number of velocity nodes / interfaces) as where λ is the expected number of velocity nodes.We also set a maximum value of N.

S Y N T H E T I C E X A M P L E S
The seismic discontinuity at Moho is a primary boundary between the crust and upper mantle, defined mainly by seismic wave velocities.In general, the P wave seismic velocity in the lower crust is about 7 km s −1 , and in the upper mantle around 8 km s −1 ; therefore, a large velocity contrast at Moho up to 1-1.5 km s −1 is expected (Grad et al. 2009).We use four 1-D velocity profiles, shown in Fig. 2, for synthetic studies.These velocity models represent four different scenarios for the Moho structure, with (a) a sharp boundary, (b) a smooth transition, (c) step functions with increasing velocity and (d) with low-velocity anomalies in the upper mantle.A 1-D velocity profile is extended to a 2-D horizontally invariant velocity model for imaging with 2-D seismic FWI using far-offset seismic data with critically reflected waves and upper-mantle refractions.Fig. 3 shows the acquisition geometry for generating seismic data in synthetic studies.The ocean bottom seismometer (OBS) is located at 3.25 km depth and 0.5 km model distance, on the seafloor of a sedimentary layer.The water depth is 3.25 km, and the oceanic crust is about 6 km thick.There are 120 sources, evenly distributed from offsets 12 to 36 km.We use this offset range because best quality of OBS signal are observed in this offset range for critically reflected PmP arrivals.For the purpose of seismic modelling, the OBS, which is used for recording seismic waves, is considered as the 'source' using reciprocity, and the sources near to the water surface are treated as 'receivers'.The offset corresponding to the critical angle for the Moho reflection is around 28 km.For inversion, the velocities shallower than 6 km in depth (about 2 km from ocean bottom) are set to the correct value because the relatively far offset data we used has fewer constraints for the upper crust; we only invert for those of the deeper (from 6 to 12 km) part of the model, including the lower crust, Moho and the upper mantle.
Two kinds of model parametrizations are used for interpolating velocity values at finite-difference grids from the sparsely distributed velocity nodes from MCMC search, i.e., the cubic spline interpolation method and the hybrid [cubic spline interpolation (6-9 km depth) and step function (9-12 km depth)] method, as described in the previous section.For all the four examples, we use 20 independent trans-dimensional MCMC chains, each with 100 000 iterations.The MCMC sampler starts from a constant velocity model of 6.6 km s −1 , which is very far away from the true solution.The first 60 000 iterations are considered as the 'burn-in' stage and the sampled models are discarded; only the velocity models from the final 40 000 iterations of each chain are included in the recorded ensemble of models.
The prior for the number of velocity nodes is a Poisson distribution, with 10 as the expected number and a maximum value of 20.The vector of layer width (the distance in depth between two neighbouring velocity nodes) is defined with a Dirichlet prior distribution, with the sum equal to 6 km.Prior for the velocity values are bounded between 6 and 8.6 km s −1 with a mean of 7 km s −1 and a standard deviation of 1 km s −1 for a Gamma distribution for all the depths.

Example 1
Fig. 4(b) shows an OBS gather generated using the velocity model of Moho structure with a sharp boundary (in Fig. 4a, the same with Fig. 2a), and Fig. 4(d) shows the predicted OBS gather using the starting model (Fig. 4c) for the MCMC sampler; Moho reflection (PmP) and refraction (Pn) is missing in the data from the starting model.The seismic data (Fig. 4b) from the true model added with random temporally correlated noise (Bodin et al. 2012) is used as the 'observed' data for inversion, with a comparison of a few selected seismic traces with and without adding noise shown in Fig. 5.The signal-to-noise ratio (SNR; calculated as the ratio between the maximum PmP amplitude and the root mean square in a 1.0 s time window before the PmP arrival) is 4.8.
Velocity-depth profiles are interpolated from sparsely distributed velocity nodes, where the required number of velocity nodes for describing the model complexity and the depth and velocity value of each node are inferred from trans-dimensional MCMC.Fig. 6 shows the posterior probability density of the interpolated velocity models using velocity nodes from trans-dimensional MCMC, with the cubic  spline interpolation (Fig. 6a) and the hybrid model parametrization (Fig. 6c), respectively.Following Mosegaard et al. (1997), we retain models every 100 iterations between accepted models for estimating the posterior distribution, because successive models from MCMC sampling usually share great similarity, and may cause bias for resolution analysis.The dashed green line is the true velocity model, where the seismic velocity at Moho changes abruptly from 7 km s −1 of the lower crust to 8 km s −1 in the upper mantle.The posterior distribution of the 1-D velocity model has higher density around the true model for both of the two parametrizations (Figs 6a and c); the Moho is clearly defined with the correct depth by most of the models in the posterior ensembles.As expected, the Moho structure is better defined using the hybrid model parametrization (compare Fig. 6c with Fig. 6a), since the step function for velocities from 9 km to greater depth better describes the abrupt velocity change at Moho.Figs 6(b) and (d) show the posterior distribution for the depths of velocity nodes from the posterior models.The velocity model between two neighbouring nodes can be interpreted as a layer, and each velocity node can be viewed as a velocity interface or discontinuity.A great portion of velocity nodes is located where there are velocity discontinuities, for example, between 9 and 10 km depth.There is a spike around 9.6 km depth in Fig. 6(d), which is consistent with the Moho interface in Fig. 6(c).Interestingly, the interface probability has another spike in the upper mantle (around 10.5 km), which corresponds to a velocity transition (Fig. 6c) constrained by refracted Pn wave.
The posterior distribution shows the non-uniqueness for seismic inversion problem.Figs 7(a) and (c) show the derived mean and the maximum posterior probability (MAP) models for the two parametrizations, respectively.Both the simulated (from the MAP model) seismograms (Figs 7b and d) from the two model parametrizations are visually identical with those from the true model for the offset of interest, which indicates the non-uniqueness of FWI problem.For a better understanding of PmP responses for different models, we also compute the seismograms for small offsets (Fig. 8), which are not included for inversion.The PmP amplitudes are much weaker than the true data in Fig. 8  tions.The number of velocity nodes in the posterior distribution is centred around 8 for the cubic spline interpolation; fewer velocity nodes are preferred in this example for defining the posterior velocity models using the hybrid model parametrization, with values around 5, because the sharp boundary in the Moho transition is better defined by using step function than cubic spline interpolation.
Fig. 10 shows the norm-2 data residuals as a function of iterations from 8 MCMC chains (randomly selected from the total 20 chains) for the cubic spline interpolation (Fig. 10a) and the hybrid model parametrization (Fig. 10c).The inversion converges after 60 000 iterations; Figs 10(b) and (d) show the zoomed-in data residual changes after convergence.The inferred velocity model ensembles in both Figs 6(a) and (c) generate predicted data with good match to the true data, with the model ambiguities (non-uniqueness) from the resolution limit of band-limited seismic data and the limited scattering angle range around critical offsets (PmP phase in the field data usually appears only near critical angles).The time-series of velocity values at 10 km depth is shown in Fig. 11.As expected, there are large velocity fluctuations at early stages of the inversion, and the samples converge to the correct value (8.05 km s −1 ) with increasing iterations.Both the data misfit and model parameters reach the state of equilibrium after 60 000 iterations, and a wellresolved structure appears in most of the sampled models.

Example 2
The second example tests the performance of trans-dimensional FWI for a Moho structure with a 0.8 km transition zone (Fig. 2b).
The posterior distribution density of the inferred velocity models (Figs 12a and c) indicates that trans-dimensional MCMC is able to find the global optima of velocity models for the strong nonlinear problem, starting from a model with constant velocity of 6.6 km s −1 (Fig. 4c) very far away from the true model; high probabilities (hot colour) follow the pattern of the true velocity model.The 0.8 km Moho transition zone with a smooth velocity gradient is defined well in Figs 12(a) and (c), using both parametrizations.The posterior for the depths of velocity nodes (Figs 12b and d) shows a large portion of nodes distributed close to and within the Moho transition zone, as expected, because large velocity changes require more nodes to define.
Fig. 13 shows the OBS gather from the true model (Fig. 13a) and the difference (Fig. 13b) with the data using the starting model (Fig. 4c) of trans-dimensional MCMC.There is waveform difference in turning waves because of velocity deviations in the lower crust and the Moho reflection.

Example 3
The Moho model used here is shown in Fig. of the Moho transition zone, because it is difficult for cubic spline to present a series of increasing velocity steps.Similar with previous examples, the velocity nodes from the posterior models concentrate where there are velocity discontinuities, especially near and in the Moho transition zone (Figs 15b and d); moreover, we observe three peaks between 9.5 and 10.6 km for the posterior model ensemble using the hybrid model parametrization, with depths consistent with the increasing velocity steps in the transition zone of the true model.
Figs 16(a) and (c) show the mean and MAP of velocity models from the posterior ensemble, using the cubic spline interpolation and the hybrid model parametrization, respectively.Both the mean and MAP contain a smoothed version of the Moho transition in Fig. 16(a); a similar pattern of velocity steps in the MAP of Fig. 16(c) with the true model can be observed.Figs 16(b) and (d) show the corresponding differences between the simulated waveforms using the two MAP models and the true data, respectively.The small differences indicate a good waveform match for both of the two MAPs, which again, illustrates the non-uniqueness for seismic inversion problem arising from the limited frequency range (3-12 Hz for the synthetic study) and limited offsets for PmP arrivals.

Example 4
The final synthetic example uses seismic data from a Moho structure with low-velocity anomalies in the upper mantle (Fig. 2d).The low-velocity layers may be composed of intrusive gabbro.The posterior distribution (Figs 17a and c) of the inferred velocity models from wide-aperture seismic data are consistent with the true model (dashed green line), with high probability densities for both the Moho discontinuity and velocity anomalies in the upper mantle.
From the posterior distribution for the depths of velocity nodes, velocity nodes are more likely to be located where there are large velocity variations (Figs 17b and d); for the hybrid model parametrization where the step function is used from 9 km depth to upper mantle, there are five spikes (Fig. 17d) at depths which are consistent with the seismic velocity discontinuities in the true model.
One of the attractive properties for FWI using trans-dimensional MCMC is that, besides the physical properties of the subsurface, it is also able to infer the model complexity for defining the velocitydepth structure, therefore removing the preparation work for determining the number of unknowns before inversion.seismograms matching those from the true model very well.There are trade-offs between velocity and interface depths (layer width); for example, a lower estimated velocity may be compensated by a relatively thicker layer (Fig. 19d) because of the frequency range and acquisition we used.

T R A N S -D I M E N S I O N A L F W I F O R M O H O S T RU C T U R E U S I N G O B S DATA F RO M T H E M I D -AT L A N T I C O C E A N
Synthetic studies using four different Moho scenarios demonstrate that, with little prior information, trans-dimensional MCMC is able Downloaded from https://academic.oup.com/gji/article/224/2/1056/5932268 by GEOMAR Bibliothek Helmholtz-Zentrum fuer Ozeanforschung user on 01 December 2020  to provide a probabilistic solution for a strong nonlinear FWI problem using wide-aperture seismic data, including the posterior distribution for the model complexity (the number of layers) and physical parameters.In this section we invert for the Moho structure using a field OBS data from the Mid-Atlantic Ocean.

OBS data from the Mid-Atlantic Ocean and pre-processing
The study region locates in the equatorial Atlantic Ocean over the crust formed at the MAR (Fig. 20).The field OBS data were acquired during the LITHOS-iLAB cruise in 2017 Novvember-December on board the R/V Maria S. Merian to study the agedependent features of the upper lithosphere from 0-50 Ma in the study region.There are 71 instruments in total consisting of 55 OBSs and 16 OBH (ocean bottom hydrophone), which were deployed along a 1100 km long transect with a variable spacing of 10-20 km, for recording long-offset wide-angle seismic refractions and reflections.The seismic profile (around 1000 km) mainly lies on the African plate (0-50 Ma) and crosses the ridge-axis on to the South American plate (0-2 Ma) for less than 75 km.All OBS were equipped with a hydrophone (measuring pressure) and three geophones (measuring vertical and horizontal displacements) whereas the OBH measured only the pressure.The data was sampled at 250 Hz for all the instruments.The active seismic source used in the survey comprised of 6 G-gun clusters (12 guns) configured as two subarrays with a total volume of 5440 cu in, which was towed at 7.5 m in depth and fired around every 400 m along the profile giving a total of 2735 shots.The relatively larger shot spacing was chosen to reduce the noise from neighbouring shots and hence enhancing the SNR for farther offsets.
Since the OBS is deployed from the sea surface, the actual OBS location is likely to be drifted away from the deployment position at surface, because of, for example, ocean currents.Therefore before processing the data, all OBS were repositioned using the direct water wave arrivals up to 5 km offset on both sides of the OBS.
We limit the data pre-processing to a minimum to keep the waveform information, including a bandpass filtering of 4-10 Hz  equation modelling is used for seismic wave simulation at each iteration of inversion.A predictive deconvolution is applied to suppress the bubble effects from the air gun sources.Traveltime tomography (Van Avendonk et al. 1998, 2004;Vaddineni et al. 2018) is first applied to the OBS gathers for estimating a P wave velocity model (Fig. 21).We consider that the velocity model up to 9 km depth is constrained from linearized inversion using, for example, the Pg phase (approximately a linear relation for velocity and waveforms of different offsets), and only invert for the velocity structure at depths between 9 and 12 km, where the Moho is likely to be present.The sparse 1-D model parametrization (no lateral heterogeneities) provide a good approximation for the velocity model within the study depth (Fig. 21, the model region between the two vertical blue lines), for OBS 56 with maximum offset 40 km, therefore we consider it a reasonable assumption for trans-dimensional FWI.There are relatively strong lateral heterogeneities in the velocity model from tomography, therefore we use 2-D acoustic wave equation modelling for a better modelling of wave physics in the crust than 1-D simulation.
We choose the PmP (denoted by dashed red line) and Pn (denoted by dashed black line) phases from the common OBS gather of OBS 56 (Fig. 22) as the observed data for estimating Moho discontinuities using trans-dimensional FWI.The offset range we used is between 24 to 36 km.OBS 56 is chosen mainly because of a good SNR at far offsets.

Source wavelet estimation
FWI requires a comparison of field and synthetic seismograms at each iteration, and the synthetic seismograms are obtained by full waveform forward modelling using a numerical solution of the wave equation.The velocity model estimated by FWI is sensitive to the source wavelet, as the simulated seismograms are determined by both the source signature and subsurface model.Thus it is important to obtain a reliable estimate for the source wavelet.
The direct arrival from the source position (water waves) and the seafloor reflection overlaps with each other in the common OBS gather (Fig. 22), along with strong scatterings and reflections from sedimentary layers close to the seafloor, making it difficult to isolate a reliable source wavelet.One option is to use the Pg phase.There is no interaction between turning waves and reflectors, the waveform of the Pg phase can potentially keep the source signature.However, the nearest-offset Pg phase that can be picked from OBS 56 data is around 5 km; lateral heterogeneities in the crust can damage the source signature during propagation.Alternatively, another option is the near-offset free-surface multiples; we observe a good separation between the free-surface multiples of water waves and reflections, as indicated by the yellow box in Fig. 22. Therefore the near-offset free-surface multiples of water waves are used for estimating source wavelet.
The dashed lines in Fig. 23(a) plot the recorded near-offset freesurface multiples of water waves, with offset ranges from -0.49 to 1.14 km.The water waves are filtered into the frequency range of 4-40 Hz.After trace alignment (the solid black lines of Fig. 23a), we observe great similarities between traces; the aligned trace are stacked together (the solid red line), which is then bandpass filtered (4-10 Hz, the green line).Because the reflection coefficient is −1 at the free surface, the waveform in green (Fig. 23a) is multiplied by −1 before being used as the source wavelet.
We use the estimated source wavelet for forward modelling using acoustic wave equation, with the velocity model in Fig. 21.Downloaded from https://academic.oup.com/gji/article/224/2/1056/5932268 by GEOMAR Bibliothek Helmholtz-Zentrum fuer Ozeanforschung user on 01 December 2020   shows a good match of the Pg phase between the field and the simulated data, which indicates that the estimated source wavelet provides a reliable source signature for FWI.We only need to adjust the amplitude of the synthetic data by multiplying a scaling factor, which can be obtained by comparing the amplitude of seismic waveforms between the synthetic and field data.

Prior for the field data
For inversion using seismic data from OBS 56, weak prior information is used.A Gamma distribution with a mean velocity value of 7.8 km s −1 , and a standard deviation of 0.8 km s −1 are used as prior distribution for the velocities.We set the maximum and minimum velocities at 6.6 and 9 km s −1 , respectively, to avoid unrealistic velocity values.
A symmetric Dirichlet prior distribution with concentration parameter of 1 is used for the vector of layer thickness, with the values summed equal to the maximum model depth, which we set to 3 km.
The prior over the number of velocity nodes is defined by a Poisson distribution.In this study, we use a value of 8 as the mean for the prior distribution of the node numbers, and set the maximum number to be 20.
Density is also incorporated into seismic forward modelling to avoid the overestimation of velocity for the subsurface model.We constrain the density from velocity using the following relationship (Shipp & Singh 2002): where V p is the P-wave velocity with unit of km s −1 , and ρ is the density with unit of kg m −3 .

Inversion results
The seismic traces between the offset range of 24 and 36 km are used as data constraints, which contain mainly PmP reflections around the critical angle and some of the Pn phase.We use 20 independent trans-dimensional MCMC chains to explore the model space.The inferred parameters include the number of velocity nodes for describing the model complexity, and the physical parameters shows the comparison between the simulated data using the estimated source wavelet and the field data for the Pg phase.
for defining the velocity discontinuities, including the velocity value and depth; the 1-D velocity models can then be interpolated using the sparsely distributed velocity nodes.There are 120 000 iterations for each MCMC chain.When analysing results, the first 80 000 iterations are considered as the 'burn-in' stage, and the sampled velocity models are discarded.Velocity models from the last 40 000 iterations of each chain are used as the posterior velocity model ensemble.Fig. 24 shows the posterior distributions of the velocity models and the distribution of velocity node (discontinuities) depths for the cubic spline interpolation (Figs 24a and b) and the hybrid model parametrization (Figs 24c and d).Fig. 24(a) indicates a Moho transition zone of about 0.7 km width with increasing velocities, and Fig. 24(c) prefers a Moho interface located at about 9.4 km depth, with velocity changing from about 7 km s −1 at lower crust to 8 km s −1 at upper mantle.The velocity uncertainty increases with increasing depths because of the lack of constraints for greater depths.
It is common to use one seismic gather for FWI with a 1-D assumption, especially in the framework of Bayesian theorem (Ray et al. 2016).For the completeness of this study, we also invert for Moho using two OBS gathers, with the additional OBS gather indicated by the yellow triangle in Fig. 20.We use the positive offsets for OBS 56 and negative offsets for OBS 54, with a mid-point between the two about 10 km.While a better resolution is usually expected with more data for illuminating the target, the posterior distribution (Fig. 25) from trans-dimensional FWI using two OBS gathers shows great similarity with the inversion results (Fig. 24) using only OBS 56; there is no obvious resolution improvement.Possible reasons include that, using additional OBS gathers nearby may not necessarily enrich the effective scattering angles for illuminating the Moho, because for both the two OBS gathers, the PmP arrivals appear only at far offsets around the critical angle.Secondly, a 1-D velocity-depth model for describing the Moho structure may be a coarse approximation for complex geological settings; using additional OBS gathers increase the data coverage, therefore may increase the discrepancies between the 1-D assumption and the 3-D subsurface.Using more OBS gathers for inversion makes it necessary to consider lateral heterogeneities by employing a more sophisticated model parametrization.
Fig. 26 shows the MAP models from the posterior velocity ensembles for both of the two model parametrizations, and their corresponding comparison with the field OBS data.The predicted seismic waveforms (solid black lines in Figs 26b and d) from the two MAP models are very similar and both match the observed data (dashed red lines) well.Whereas there is ambiguities in determining the preferred velocity model, because of the limited resolution of the band-limited (4-10 Hz for the field data) seismic data and the limited available offsets that is used for this study, there are substantial velocity changes between 9.3 and 9.6 km, supported by the posterior distribution of velocity models (Figs 24a and c) and the distribution for depths of velocity nodes (Figs 24b and d).
As there are two inverted models (Fig. 24) for the field data, it may be reasonable to follow Occam's razor rule for choosing a simpler model; on the other hand, we notice that Moho reflection can only be observed in the far offsets (larger than 20 km).The missing of Moho reflection in the near to intermediate offsets may indicate a weak reflection for smaller scattering angles.Considering that PmP from a smooth Moho transition can be identical at far offsets with that from Moho with sharp boundary, but the amplitude can be much weaker at small offsets (see Figs 7  and 8), therefore it may be reasonable to prefer the Moho model with a 0.7 km transition zone.However, it is possible that PmP was recorded but interfering with other waves in the near offsets.It can be helpful to acquire new wide-aperture seismic data with higher S/N and wide frequency spectrum for improving the resolution of Moho.
The results of this study indicate the possibility of a layered structure in the vicinity of Moho (Fig. 27) at slow spreading MAR.Although the two MAP models are different, they suggest a 1.5-2 km thick transition zone in the upper mantle.One of the models shows lower and higher velocity layers in the upper mantle, where low-velocity layers in the mantle can be composed of Gabbro layers, and the high-velocity layers may indicate Peridotite; the other model  indicates increasing velocity layers in the upper mantle.These results are consistent with the absence of any strong seismic reflection image for slow spreading oceanic crust and observation from the Bay of Island Ophiolite (Brocher et al. 1985).The velocity structure in the relatively deeper part of the upper mantle is mainly constrained by the Pn phase, which has limited penetration depth.
There are relatively large waveform misfits at very far offsets (start from 35.2 km), where the overlap of PmP and Pn phases appears (Fig. 22).The results can be improved using more sophisticated model parametrization for describing the possibly heterogeneous upper mantle, and more efficient model searching algorithms for practical application in future studies.

D I S C U S S I O N
FWI of the PmP phase is rarely used for Moho model building, because of the strong nonlinearity observed in the vicinity of the critical offsets, the waveform interference from Pg and Pn phases, and the relatively low SNR for PmP phase compared with seismic reflections from the upper crust (Grad et al. 2009;Hrubcová et al. 2013;Jian et al. 2017;Beller et al. 2018).Trans-dimensional MCMC provides a fully nonlinear approach for inverse problems (Sambridge et al. 2006).Unlike the common linearized inversion method that the solution is a single final model which strongly depends on the starting model, the trans-dimensional MCMC method searches the model space directly, and estimates the posterior distribution of the seismic velocity model.Compared with common MCMC methods (e.g. the Metropolis-Hastings algorithm) which require the number of unknown (e.g. the number of velocity layers for 1-D inversion) as a fixed prior, the proposed method is able to explore model space with a variable number of parameters, making the inversion workflow completely data-driven and simplifying the inversion preparation work.
To better simulate the physics of seismic wave propagation, 2-D wave equation modelling is performed to take into account the lateral heterogeneity in the upper crust, as observed in the velocity model from traveltime tomography.The computational time per MCMC chain using 10 CPU cores is about 50 hr for 100 000 iterations in the synthetic studies, and about 70 hr for the field study using one OBS gather.MCMC sampling usually requires a large number of iterations to find the global optimum, and the required iteration increases dramatically with the increasing number of parameters (the curse of dimensionality; Sambridge et al. 2006).Limited by computational resources, we use a sparse model parametrization with a 1-D assumption for velocity models close to Moho, for the purpose of reducing model dimension and consequently an efficient search of the model space.The 1-D assumption can be reasonable for Moho structure inversion within the offset range for one common-OBS seismic gather, but may be not able to describe the model complexity for large-scale seismic survey or when there are complex geological settings.For future studies, it will be important to use a more advanced sparse model parametrization, for example, wavelet parametrization (Hawkins & Sambridge 2015), to allow a balance between the ability for characterizing complex (at least 2-D) earth models and minimizing the number of unknowns.Gradient information from adjoint-state FWI can be used for accelerating the model space exploration of Monte Carlo search, for example in Hamiltonian Monte Carlo (Biswas & Sen 2017;Fichtner et al. 2018;Fichtner & Zunino 2019;Zhao et al. 2019;Gebraad et al. 2020), therefore may allow us to use high-dimensional model parametrization.Variational inference may also provide an efficient solution for Bayesian inference problems (Zhang & Curtis 2020).
In the present study, the model obtained from traveltime tomography is preserved down to 9 km depth.It is worthwhile to mention that the resolution of traveltime tomography is significantly lower than that of the FWI, and therefore there may be potential traveltime errors when simulating seismograms using the tomographic model.The possible errors in the tomographic model may shift the Moho transition towards shallower (overestimated crust velocity) or deeper (underestimated crust velocity) depths.It would be useful to perform adjoint-state FWI for improving the oceanic crust  model for better accuracy, or conduct a complete Bayesian inversion workflow for the whole model including both crust and Moho (which requires more efficient sampling algorithms).With these in mind, this study is more interested in characterizing the Moho structure than its absolute depth.This is left for future studies.
The acoustic assumption is an approximation to the (visco)elastic earth, and has been widely used for FWI (Tarantola 1984;Virieux & Operto 2009;Kamei et al. 2013).With many successful application for acoustic FWI, we should bear in mind that many of them only use the turning waves (Kamei et al. 2013).Górszczyk et al. (2017) presented a robust workflow for OBS FWI in a complex subduction zone, with data constraints including not only the more linearized events of refracted and diving waves, but also wide angle Moho reflections.However, a very accurate velocity model is required in the vicinity of Moho in order to avoid getting trapped into local minima.In this study, we use mainly wide-aperture PmP reflections around critical angles as data for the Moho structure.Previous studies (Chapman et al. 2014;Guo et al. 2019) indicate that there can be large waveform difference in both amplitude and phase, between data predicted by acoustic and elastic wave equations, especially around critical angles.Fig. 28 shows a comparison of elastic and acoustic wave modelling using the velocity profile in Fig. 2(a) and the acquisition geometry in Fig. 3.The S-wave velocity is derived from the P-wave velocity using an empirical relation following Brocher (2005), with velocities in the sea water setting to zero.The green oval indicates the Moho reflection (PmP).The waveform matches well for the first arrival (Pg and Pn) for acoustic and elastic modelling, but there is more difference for PmP.Moho reflection from elastic modelling is generally weaker than that from acoustic modelling, because of the energy conversion between P and S waves.An elastic wave equation modelling is expected to provide a more realistic modelling of the physics for wave propagation in the solid earth, and therefore have the potential for improving the field data inversion results, at the trade-off of a significant increase on the computational cost.This is left for future studies.

C O N C L U S I O N S
In the presence of a high-velocity contrast, such as at Moho, FWI using wide-aperture seismic data is a strong nonlinear problem because of the abrupt changes in reflection coefficients around the critical angles and interferences from different seismic phases, therefore the widely used linearized inversion approach may not work.We propose to use the nonlinear trans-dimensional MCMC method for solving the seismic FWI problem in the context of Moho structure building, with the state-of-the-art seismic waveform modelling by finite-difference solution of wave equation.The use of 2-D wave equation modelling allows us to incorporate the wave physics for the heterogeneous upper crust structure, and the 1-D model parametrization near Moho enables an efficient search of the transdimensional model space.Trans-dimensional MCMC provides a complete data-driven probabilistic solution for the nonlinear FWI problem, with the outcome including the posterior distribution for both the velocity-depth structure, and the model complexity (the optimal number of parameters for describing the model).
With very weak prior information, synthetic studies using four different Moho scenarios demonstrate the robustness of the transdimensional MCMC method, with high probabilities in the posterior distribution consistent with the target velocity structure.Inversion results from the field OBS data in the Mid-Atlantic Ocean indicates Downloaded from https://academic.oup.com/gji/article/224/2/1056/5932268 by GEOMAR Bibliothek Helmholtz-Zentrum fuer Ozeanforschung user on 01 December 2020 Figure 28.A comparison of simulated seismograms using acoustic (solid black line) and elastic (dashed red line) wave equations.The acquisition geometry is the same with that in Fig. 3.We use the P-wave velocity profile in Fig. 2(a); the S-wave velocity is derived from the P-wave velocity using an empirical relation following Brocher (2005), with velocities in the sea water setting to zero.The green oval indicates the Moho reflection (PmP).
that Moho in the study area can be a transition zone of around 0.7 km, or it can also be a sharp boundary with velocities from around 7 km s −1 in the lower crust to 8 km s −1 of the upper mantle; both provide similar waveform match for the field data.The ambiguity comes from the resolution limit of the band-limited seismic data and from the limited scattering angle range for observed Moho reflection.The inferred posterior distribution of interfaces (velocity nodes) strongly coincides with the depth of velocity discontinuity.Preliminary results from our study indicate layered structure in the vicinity of Moho.

A C K N O W L E D G E M E N T S
The OBS data were acquired during the LITHOS experiment on board the German R/V Maria S. Merian in 2017, in the leadership of Dr Ingo Grevemeyer.This research was funded by the Deep Earth Imaging Future Science Platform, CSIRO.This project was funded by the European Research Council Advanced Grant agreement No. 339442 TransAtlanticILAB.The cruise with the ID MSM69 was funded by the German Science Foundation (DFG).This work was supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia.We also thank the computational resources supported by S-CAPAD.Finally, we thank Editor Dr Michal Malinowski, and reviewer Dr Andrzej Górszczyk and an anonymous reviewer for their constructive comments which significantly improved the paper.

Figure 1 .
Figure 1.The 1-D velocity model can be defined using a series of sparsely distributed velocity nodes, with (a) the cubic spline interpolation and (b) the hybrid model parametrization.

Figure 2
Figure 2. 1-D velocity models representing four different Moho structures for synthetic tests, with (a) a sharp boundary, (b) a smooth transition, (c) step functions with increasing velocity and (d) with low-velocity anomalies in the upper mantle.

Figure 3 .
Figure3.2-D model with horizontally invariant velocities, using the velocity profile in Fig.2(a).The blue symbol refers to location of the ocean bottom seismometer (OBS) for recording seismic waves, and the green dots near the water surface refer to seismic sources.Using reciprocity, the OBS is treated as the 'source', and the sources close to the water surface are treated as 'receivers' for seismic modelling.
(b) from the Moho model with a transition zone, which indicates that the non-uniqueness can be reduced when there are more scattering angles in the data for illumination.Figs 9(a) and (b) show a comparison of the prior and posterior distributions for the number of velocity nodes (model 'dimension' in the context of trans-dimension MCMC) using the two parametriza-Downloaded from https://academic.oup.com/gji/article/224/2/1056/5932268 by GEOMAR Bibliothek Helmholtz-Zentrum fuer Ozeanforschung user on 01 December 2020 P. Guo et al.

Figure 4 .
Figure 4. (b) A common OBS gathers using the velocity model of Fig. 2(a) (also plotted in (a)); and (d) a common OBS gathers using the velocity model in (c), which is the model at the first iteration of the trans-dimensional MCMC FWI.Three seismic phases are identified, including 'Pg' (turning wave), 'PmP' (Moho reflection) and 'Pn' (refraction wave from Moho).The seismic data is simulated with a finite-difference solution of the 2-D acoustic wave equation (eq.4).
Figs 14(a) and (c) contain the velocity mean and the maximum a posterior probability (MAP) from the velocity model ensembles for the two model parametrizations, Downloaded from https://academic.oup.com/gji/article/224/2/1056/5932268 by GEOMAR Bibliothek Helmholtz-Zentrum fuer Ozeanforschung user on 01 December 2020

Figure 5 .
Figure 5.A group of selected seismic traces (solid black line) from Fig. 4(b); the seismic data after adding noise (dashed red line) is used as the 'observed' data for FWI.

Figure 6 .
Figure 6.The posterior distribution density of the inferred velocity model from trans-dimensional FWI, using (a) the cubic spline interpolation and (c) the hybrid model parametrization, respectively; (b) and (d) shows the posterior distribution of the velocity node depths for these two parametrizations, respectively.The concentration of velocity nodes coincide with the distribution of velocity discontinuities in the true model.Prior for the velocity values are bounded between 6 and 8.6 km s −1 with a mean of 7 km s −1 and a standard deviation of 1 km s −1 for a Gamma distribution.respectively.Both the mean and MAP in Fig. 14(a) are almost identical to the true model.The MAP of the hybrid model parametrization (Fig. 14c) contains multiple layers in the Moho transition zone for describing the velocity changes in the true model; there is a great similarity between the mean and the true model in the transition zone.The difference (Figs 14b and d) between the predicted data using MAP and that from the true model is negligible; the difference in MAP models indicate the non-uniqueness of the FWI problem.

Figure 7 .
Figure 7.The mean and the maximum a posterior probability (MAP) models for (a) the cubic spline interpolation and (c) the hybrid model parametrization; the true model is plotted using the dashed red line.(b) and (d) show the corresponding OBS gathers from the true (dashed red line) model, and from the MAP (solid black line) models for the two parametrizations, respectively; the waveforms from the MAP and the true models are visually identical.

Figure 8 .
Figure 8.(a) and (c) plot the maximum a posterior probability (MAP, solid black line) models from inversion using (a) the cubic spline interpolation and (c) the hybrid model parametrization, and the true model (dashed red line).(b) and (d) show the corresponding simulated OBS gathers from the MAP (solid black line) for the two parametrizations, respectively, and for the true (dashed red line) model.The amplitudes of MAP are weaker than the true data in (b), and those in (d) are visually identical for both the two parametrizations.

Figure 9 .Figure 10 .
Figure 9.The prior and posterior probability for the number of velocity nodes using (a) the cubic spline interpolation and (b) the hybrid model parametrization.nodes in the posterior models increases, because there are more velocity discontinuities in the velocity model ofFig.2(d)  than that in Fig.2(a).Fig.19shows three randomly selected velocity models from the posterior ensemble, and the comparison between the correspond-

Figure 11 .
Figure 11.The evolution of velocity value at depth 10 km as a function of iterations for (a) the cubic spline interpolation and (b) the hybrid model parametrization.The vertical dashed black line is located at 60 000 iterations, where MCMC chains are considered reaching convergence.The variance of the post-burn-in stage for the two parametrizations is also labelled in the plots.

Figure 12 .
Figure 12.The posterior distribution density of the inferred velocity model from trans-dimensional FWI, using (a) the cubic spline interpolation and (c) the hybrid model parametrization, respectively; (b) and (d) shows the posterior distribution of the velocity node depths for these two parametrizations, respectively.The concentration of velocity nodes coincide with the distribution of velocity discontinuities in the true model.

Figure 13 .
Figure 13.(a) The seismic data simulated using the true model in Fig. 2(b); (b) the difference with the data using the velocity model (Fig. 4c) at the first iteration of trans-dimensional MCMC.The colour scale is the same with Fig. 4. Note that a reduced velocity of 8 km s −1 has been applied to the data.

Figure 14 .Figure 15 .Figure 16 .Figure 17 .Figure 18 .Figure 19 .
Figure 14.The mean and the maximum a posterior probability (MAP) models for (a) the cubic spline interpolation and (c) the hybrid model parametrization; the true model is plotted using the dashed red line.(b) and (d) show the corresponding data difference between the true data and data simulated using MAP for the two model parametrizations; the difference is negligible.Note that a reduced velocity of 8 km s −1 has been applied to the data, and the colour scale here is smaller than that in Fig. 4 for visualizing the difference.(a) (b) (c) (d) Figure 15.The posterior distribution density of the inferred velocity model from trans-dimensional FWI, using (a) the cubic spline interpolation and (c) the hybrid model parametrization, respectively; (b) and (d) shows the posterior distribution of the velocity node depths for these two parametrizations, respectively.The concentration of velocity nodes coincide with the distribution of velocity discontinuities in the true model.Downloaded from https://academic.oup.com/gji/article/224/2/1056/5932268 by GEOMAR Bibliothek Helmholtz-Zentrum fuer Ozeanforschung user on 01 December 2020

Figure 20 .
Figure 20.The OBS seismic survey over the crust formed at the Mid-Atlantic Ridge (MAR) in the Mid-Atlantic Ocean.The black dots refer to the locations of a total of 71 OBS instruments.The red triangle refers to the location for OBS 56, and the yellow one nearby refers to OBS 54.

Figure 21 .
Figure 21.The velocity model from traveltime tomography.The red cross refers to the location of OBS 56.The model between the two vertical blue lines is used for trans-dimensional FWI using seismic data of OBS 56; the velocity model to 9 km depth is kept unchanged during FWI, and we only invert for the velocity structure from 9 to 12 km in depth using 1-D model assumption.

Figure 22 .
Figure 22.Seismic data for OBS 56, where the 'PmP', 'Pg' and 'Pn' phases are indicated by the dashed red, blue and black lines.The yellow box at near offsets around traveltime of 7 s shows a good separation between free-surface multiples of the water waves and subsurface reflections.

Fig
Fig. 23(b)  shows a good match of the Pg phase between the field and the simulated data, which indicates that the estimated source wavelet provides a reliable source signature for FWI.We only need to adjust the amplitude of the synthetic data by multiplying a scaling factor, which can be obtained by comparing the amplitude of seismic waveforms between the synthetic and field data.

Figure 23 .
Figure23.(a) the seismic traces (dashed line lines) of free-surface multiple water waves at near offsets from the area circled by the yellow box in Fig.22; the red line shows the stacking result after trace alignment (solid black lines).The green line shows the bandpass filtered (4-10 Hz) waveform from the stacking result.Because the reflection coefficient is −1 at free surface, the waveform of green colour multiplied by -1 is used as the source wavelet.(b) shows the comparison between the simulated data using the estimated source wavelet and the field data for the Pg phase.

Figure 24 .Figure 25 .
Figure 24.The posterior distribution density of the inferred velocity model from trans-dimensional FWI using one common OBS gather (OBS 56), using (a) the cubic spline interpolation and (c) the hybrid model parametrization, respectively; (b) and (d) show the posterior distribution of the velocity node depths for these two respectively.

Figure 26 .
Figure 26.The maximum a posterior probability (MAP) models for (a) the cubic spline interpolation and (c) the hybrid model parametrization.(b) and (d) show the comparison between the field OBS data and the data simulated using MAP for the two model parametrizations.The offset increment for the plotted traces is about 0.4 km.

Figure 27 .
Figure 27.(a) and (c) are the MAP models from Figs 26(a) and (c); (b) and (d) are the schematic diagrams showing the Moho and the layering structures in the upper mantle from the inferred velocity models. ,