HPF Observations of completely blank sky

In a previous notebook we illustrated how to refine HPF sky subtraction based on calibrated twilight flats. Here we acquire novel calibration data of completely blank sky to measure the ratio between sky lines without the vagueries of pesky continuum.

[1]:
from muler.hpf import HPFSpectrum, HPFSpectrumList

import matplotlib.pyplot as plt
%config InlineBackend.figure_format='retina'
plt.rcParams['axes.facecolor']='white'
plt.rcParams['savefig.facecolor']='white'
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from muler.hpf import HPFSpectrum, HPFSpectrumList
      3 import matplotlib.pyplot as plt
      4 get_ipython().run_line_magic('config', "InlineBackend.figure_format='retina'")

File ~/checkouts/readthedocs.org/user_builds/muler/envs/latest/lib/python3.8/site-packages/muler/hpf.py:14
     12 import warnings
     13 import logging
---> 14 from muler.echelle import EchelleSpectrum, EchelleSpectrumList
     15 from muler.utilities import resample_list
     16 import numpy as np

File ~/checkouts/readthedocs.org/user_builds/muler/envs/latest/lib/python3.8/site-packages/muler/echelle.py:31
     29 from scipy.ndimage import median_filter, gaussian_filter1d
     30 import specutils
---> 31 from muler.utilities import apply_numpy_mask, is_list, resample_list
     34 # from barycorrpy import get_BC_vel
     35 from astropy.coordinates import SkyCoord, EarthLocation

File ~/checkouts/readthedocs.org/user_builds/muler/envs/latest/lib/python3.8/site-packages/muler/utilities.py:15
     13 from astropy.convolution import convolve, Gaussian1DKernel
     14 from scipy.ndimage import binary_dilation
---> 15 from astroquery.simbad import Simbad
     16 Simbad.add_votable_fields('V', 'B', 'J', 'H', 'K', 'parallax')
     17 LinInterpResampler = LinearInterpolatedResampler()

ModuleNotFoundError: No module named 'astroquery'

You can retrieve these data from the muler example data GitHub site. There are ten examples online, let’s just look at one of those.

[2]:
online_path = 'https://github.com/OttoStruve/muler_example_data/raw/main/HPF/02_blank_sky/'
filename = 'Goldilocks_20210825T052211_v1.0_0021.spectra.fits'
spectra = HPFSpectrumList.read(online_path+filename)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 3
      1 online_path = 'https://github.com/OttoStruve/muler_example_data/raw/main/HPF/02_blank_sky/'
      2 filename = 'Goldilocks_20210825T052211_v1.0_0021.spectra.fits'
----> 3 spectra = HPFSpectrumList.read(online_path+filename)

NameError: name 'HPFSpectrumList' is not defined
[3]:
spectra.plot(ylo=0, yhi=100);
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 spectra.plot(ylo=0, yhi=100);

NameError: name 'spectra' is not defined

Neat, we see sky emission lines across the spectrum. Let’s take a look up close:

[4]:
spectrum = spectra[19]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 spectrum = spectra[19]

NameError: name 'spectra' is not defined
[5]:
ax= spectrum.sky.plot(ylo=-5, yhi=100, label='Sky Fiber');
spectrum.plot(ax=ax, label='Target Fiber');
ax.legend();
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 ax= spectrum.sky.plot(ylo=-5, yhi=100, label='Sky Fiber');
      2 spectrum.plot(ax=ax, label='Target Fiber');
      3 ax.legend();

NameError: name 'spectrum' is not defined

Neat! We see slight differences between the two fibers. Let’s drill down even more to see how the different sky-subtraction methods perform:

[6]:
ax = spectrum.plot(label='Science Fiber');
for method in ['naive', 'scalar', 'vector']:
    spectrum.sky_subtract(method=method).plot(ax=ax, label=method);

ax.set_xlim(10_830, 10_840)
ax.set_ylim(-10, 60)
ax.legend();
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 ax = spectrum.plot(label='Science Fiber');
      2 for method in ['naive', 'scalar', 'vector']:
      3     spectrum.sky_subtract(method=method).plot(ax=ax, label=method);

NameError: name 'spectrum' is not defined

Indeed the naive approach (orange) over-subtracts, leaving a conspicuous dip. The scalar and vector corrections both work fine!

Compute the Science-to-sky fiber ratio over the entire HPF bandpass

Let’s look at all the high signal-to-noise ratio sky emission lines— do they have the same ratios?

[7]:
import numpy as np
from specutils.manipulation import LinearInterpolatedResampler, SplineInterpolatedResampler
resampler = SplineInterpolatedResampler()
[8]:
twilight_ratio_raw = spectrum.get_static_sky_ratio_template()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 twilight_ratio_raw = spectrum.get_static_sky_ratio_template()

NameError: name 'spectrum' is not defined
[9]:
full_science = spectra.stitch()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 full_science = spectra.stitch()

NameError: name 'spectra' is not defined

Let’s resample the sky spectrum to the exact wavelength coordinates as the science spectrum.

[10]:
resampled_sky_spectrum = resampler(full_science.sky, full_science.wavelength)
resampled_sky_spectrum = HPFSpectrum(spectral_axis=resampled_sky_spectrum.wavelength,
                                    flux=resampled_sky_spectrum.flux,
                                    uncertainty=resampled_sky_spectrum.uncertainty)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 resampled_sky_spectrum = resampler(full_science.sky, full_science.wavelength)
      2 resampled_sky_spectrum = HPFSpectrum(spectral_axis=resampled_sky_spectrum.wavelength,
      3                                     flux=resampled_sky_spectrum.flux,
      4                                     uncertainty=resampled_sky_spectrum.uncertainty)

NameError: name 'full_science' is not defined
[11]:
full_sky = resampled_sky_spectrum.flux
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 full_sky = resampled_sky_spectrum.flux

NameError: name 'resampled_sky_spectrum' is not defined
[12]:
full_snr = resampled_sky_spectrum.snr
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 full_snr = resampled_sky_spectrum.snr

NameError: name 'resampled_sky_spectrum' is not defined

Let’s just pick the very high SNR pixels.

[13]:
full_mask = full_snr.value > 40
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 full_mask = full_snr.value > 40

NameError: name 'full_snr' is not defined
[14]:
ax = resampled_sky_spectrum.plot(yhi=110);
(resampled_sky_spectrum*full_mask).plot(ax=ax);
ax.set_ylabel('Signal-to-noise ratio (per pixel)');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 ax = resampled_sky_spectrum.plot(yhi=110);
      2 (resampled_sky_spectrum*full_mask).plot(ax=ax);
      3 ax.set_ylabel('Signal-to-noise ratio (per pixel)');

NameError: name 'resampled_sky_spectrum' is not defined

Looks good, we are focusing on the highest-SNR emission lines.

[15]:
full_ratio = full_science.divide(resampled_sky_spectrum, handle_meta='ff').multiply(full_mask*full_science.flux.unit)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 full_ratio = full_science.divide(resampled_sky_spectrum, handle_meta='ff').multiply(full_mask*full_science.flux.unit)

NameError: name 'full_science' is not defined
[16]:
ax = full_ratio.plot(ylo=0.8, yhi=1.1, linestyle='none', marker='o', ms=3, alpha=0.8, label='Blank Night Sky')
twilight_ratio_raw.plot(ax=ax, linestyle='none', marker='.', ms=1, label='Twilight Flats');

ax.set_title('HPF Science Fiber to Sky Fiber Ratio')

ax.axhline(1.0, linestyle='dotted', color='k')

ax.set_ylabel('Ratio')
ax.set_ylim(0.8, 1.05)
ax.legend();
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 ax = full_ratio.plot(ylo=0.8, yhi=1.1, linestyle='none', marker='o', ms=3, alpha=0.8, label='Blank Night Sky')
      2 twilight_ratio_raw.plot(ax=ax, linestyle='none', marker='.', ms=1, label='Twilight Flats');
      4 ax.set_title('HPF Science Fiber to Sky Fiber Ratio')

NameError: name 'full_ratio' is not defined

Nice, we get about the same answer and trend using these data and the twilight flat data. Se see a high amount of scatter, but comparable to the scatter seen in the twilight flats themeselves (not shown). Overall the scatter must come in part from random noise, and possible some weak trends from genuine physics– truly different atmospheric temperatures, airmasses, pressures, wind shears, etc.

Deep dive: Do sky-free pixels show any other artifacts?

The background flux in most pixels is so close to zero that the denominator in the ratio has many near-zero values, sending the ratio spectrum to near-infinity. Here we compute the difference spectrum—scaled by the twilight flat vector—and divided by the Goldilocks-reported uncertainty vector. Most of the nominally empty pixels in the difference vector constructed in this way should have mean value of zero with a propagated \(\sigma\) of \(\sqrt{1^2 + 0.93^2}\).

[17]:
from scipy.stats import norm
import celerite2
from celerite2 import terms
[18]:
full_twilight_ratio = resampler(twilight_ratio_raw, full_science.spectral_axis)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 full_twilight_ratio = resampler(twilight_ratio_raw, full_science.spectral_axis)

NameError: name 'twilight_ratio_raw' is not defined

Compute the residual, and normalize by the uncertainty.

\(\mathbf{r} = \mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}}\)

[19]:
residual = (full_science - full_twilight_ratio*full_sky)
sigmas = np.hstack([spec.uncertainty.array for spec in spectra])
residual_in_sigma = residual.flux.value/(np.hypot(1.0, 0.93)*sigmas)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 residual = (full_science - full_twilight_ratio*full_sky)
      2 sigmas = np.hstack([spec.uncertainty.array for spec in spectra])
      3 residual_in_sigma = residual.flux.value/(np.hypot(1.0, 0.93)*sigmas)

NameError: name 'full_science' is not defined

Smooth the residual spectrum with a Gaussian Process.

[20]:
mask = np.isfinite(residual_in_sigma)
x = residual.wavelength.value[mask]
y = residual_in_sigma[mask]
N_pix = len(x)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 mask = np.isfinite(residual_in_sigma)
      2 x = residual.wavelength.value[mask]
      3 y = residual_in_sigma[mask]

NameError: name 'residual_in_sigma' is not defined
[21]:
kernel = terms.SHOTerm(sigma=0.8, rho=200.0, Q=0.25)

gp = celerite2.GaussianProcess(kernel, mean=0.1)
gp.compute(x, yerr=np.ones(N_pix))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 4
      1 kernel = terms.SHOTerm(sigma=0.8, rho=200.0, Q=0.25)
      3 gp = celerite2.GaussianProcess(kernel, mean=0.1)
----> 4 gp.compute(x, yerr=np.ones(N_pix))

NameError: name 'x' is not defined
[22]:
plt.figure(figsize=(15, 5))
plt.plot(x, y, linestyle='none', marker='o', alpha=0.1, ms=3, color='#2980b9', label=None)
plt.plot(x, y*0-999, linestyle='none', marker='o', alpha=1, ms=3, color='#2980b9',
         label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$')
plt.plot(x, gp.predict(y, x), linestyle='none', color='#e67e22', marker='.', ms=2, label=None)

plt.plot(x, y*0-999, color='#e67e22', label='Systematic bias')
#plt.plot(x, gp.predict(y_noise, x), linestyle='none', color='#95a5a6', marker='.', ms=2, label='IID')
plt.scatter(residual.wavelength.value[full_mask], residual_in_sigma[full_mask],
         marker='o', s=29, ec='k', c='#f1c40f', zorder=10, label='High SNR Emission Line Pixels')
plt.axhline(0.0, color='k', linestyle='dashed')
plt.ylim(-5, 5)
plt.legend(ncol=3)
plt.xlabel('$\lambda \; (\AA)$')
plt.ylabel('Residual $(\sigma)$')
#plt.savefig('../../paper/data_analysis_I/figures/Scaled_sky_residual_trend.png', dpi=300, bbox_inches='tight')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 plt.figure(figsize=(15, 5))
      2 plt.plot(x, y, linestyle='none', marker='o', alpha=0.1, ms=3, color='#2980b9', label=None)
      3 plt.plot(x, y*0-999, linestyle='none', marker='o', alpha=1, ms=3, color='#2980b9',
      4          label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$')

NameError: name 'plt' is not defined

We see minor systematic trends with an amplitude of about <0.5 sigma, large enough to bloat the realized RMS from its expected value of \(\sqrt{1^2 + 0.93^2}\).

We can make a histogram to quantify the bloating scale.

[23]:
bins = np.arange(-15, 15, 0.2)

standard_normal = norm(loc=0, scale=1).pdf(bins)
refined_uncertainty = norm(loc=0.1, scale=1.3).pdf(bins)
[24]:
plt.hist(residual_in_sigma, bins=bins, density=True, label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$');
plt.plot(bins, standard_normal, lw=3,
         label='Standard normal (Expected with accurate uncertainties)');
plt.plot(bins, refined_uncertainty, lw=2, color='k',
         label='Observed: $\hat\sigma = 1.30\sigma_{\mathrm{G}} \quad \hat\mu = 0.1\sigma_{\mathrm{G}}$');
plt.xlabel('Residual ($\sigma_{\mathrm{G}}$)')

plt.hist(gp.predict(y, x), bins=bins, density=True, label='Systematic Bias');

plt.yscale('log')
plt.ylim(1e-4, 1e3)
plt.xlim(-6, 6)
plt.legend(loc='best')
#plt.savefig('../../paper/data_analysis_I/figures/Scaled_sky_residual_uncertainty.png', dpi=300, bbox_inches='tight')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 plt.hist(residual_in_sigma, bins=bins, density=True, label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$');
      2 plt.plot(bins, standard_normal, lw=3,
      3          label='Standard normal (Expected with accurate uncertainties)');
      4 plt.plot(bins, refined_uncertainty, lw=2, color='k',
      5          label='Observed: $\hat\sigma = 1.30\sigma_{\mathrm{G}} \quad \hat\mu = 0.1\sigma_{\mathrm{G}}$');

NameError: name 'plt' is not defined

We see find a realized uncertainty of \(1.3\times\) the expected value, in other words Goldilocks underestimates the uncertainty by 1.3x, attributable to this systematic bias trend. I get the same trend if I use a scalar (\(\bar \beta=0.93\)) prefactor to the sky fiber, so I know I am not injecting this signal artificially from the twilight flats. All things considered, this systematic bias is pretty tiny and immaterial for most scientific applications.