HPF Observations of completely blank sky¶
In a previous notebook we illustrated how to refine HPF sky subtraction based on calibrated twilight flats. Here we acquire novel calibration data of completely blank sky to measure the ratio between sky lines without the vagueries of pesky continuum.
[1]:
from muler.hpf import HPFSpectrum, HPFSpectrumList
import matplotlib.pyplot as plt
%config InlineBackend.figure_format='retina'
plt.rcParams['axes.facecolor']='white'
plt.rcParams['savefig.facecolor']='white'
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from muler.hpf import HPFSpectrum, HPFSpectrumList
3 import matplotlib.pyplot as plt
4 get_ipython().run_line_magic('config', "InlineBackend.figure_format='retina'")
File ~/checkouts/readthedocs.org/user_builds/muler/envs/latest/lib/python3.8/site-packages/muler/hpf.py:14
12 import warnings
13 import logging
---> 14 from muler.echelle import EchelleSpectrum, EchelleSpectrumList
15 from muler.utilities import resample_list
16 import numpy as np
File ~/checkouts/readthedocs.org/user_builds/muler/envs/latest/lib/python3.8/site-packages/muler/echelle.py:31
29 from scipy.ndimage import median_filter, gaussian_filter1d
30 import specutils
---> 31 from muler.utilities import apply_numpy_mask, is_list, resample_list
34 # from barycorrpy import get_BC_vel
35 from astropy.coordinates import SkyCoord, EarthLocation
File ~/checkouts/readthedocs.org/user_builds/muler/envs/latest/lib/python3.8/site-packages/muler/utilities.py:15
13 from astropy.convolution import convolve, Gaussian1DKernel
14 from scipy.ndimage import binary_dilation
---> 15 from astroquery.simbad import Simbad
16 Simbad.add_votable_fields('V', 'B', 'J', 'H', 'K', 'parallax')
17 LinInterpResampler = LinearInterpolatedResampler()
ModuleNotFoundError: No module named 'astroquery'
You can retrieve these data from the muler example data GitHub site. There are ten examples online, let’s just look at one of those.
[2]:
online_path = 'https://github.com/OttoStruve/muler_example_data/raw/main/HPF/02_blank_sky/'
filename = 'Goldilocks_20210825T052211_v1.0_0021.spectra.fits'
spectra = HPFSpectrumList.read(online_path+filename)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 3
1 online_path = 'https://github.com/OttoStruve/muler_example_data/raw/main/HPF/02_blank_sky/'
2 filename = 'Goldilocks_20210825T052211_v1.0_0021.spectra.fits'
----> 3 spectra = HPFSpectrumList.read(online_path+filename)
NameError: name 'HPFSpectrumList' is not defined
[3]:
spectra.plot(ylo=0, yhi=100);
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 spectra.plot(ylo=0, yhi=100);
NameError: name 'spectra' is not defined
Neat, we see sky emission lines across the spectrum. Let’s take a look up close:
[4]:
spectrum = spectra[19]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 spectrum = spectra[19]
NameError: name 'spectra' is not defined
[5]:
ax= spectrum.sky.plot(ylo=-5, yhi=100, label='Sky Fiber');
spectrum.plot(ax=ax, label='Target Fiber');
ax.legend();
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 ax= spectrum.sky.plot(ylo=-5, yhi=100, label='Sky Fiber');
2 spectrum.plot(ax=ax, label='Target Fiber');
3 ax.legend();
NameError: name 'spectrum' is not defined
Neat! We see slight differences between the two fibers. Let’s drill down even more to see how the different sky-subtraction methods perform:
[6]:
ax = spectrum.plot(label='Science Fiber');
for method in ['naive', 'scalar', 'vector']:
spectrum.sky_subtract(method=method).plot(ax=ax, label=method);
ax.set_xlim(10_830, 10_840)
ax.set_ylim(-10, 60)
ax.legend();
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 ax = spectrum.plot(label='Science Fiber');
2 for method in ['naive', 'scalar', 'vector']:
3 spectrum.sky_subtract(method=method).plot(ax=ax, label=method);
NameError: name 'spectrum' is not defined
Indeed the naive approach (orange) over-subtracts, leaving a conspicuous dip. The scalar and vector corrections both work fine!
Compute the Science-to-sky fiber ratio over the entire HPF bandpass¶
Let’s look at all the high signal-to-noise ratio sky emission lines— do they have the same ratios?
[7]:
import numpy as np
from specutils.manipulation import LinearInterpolatedResampler, SplineInterpolatedResampler
resampler = SplineInterpolatedResampler()
[8]:
twilight_ratio_raw = spectrum.get_static_sky_ratio_template()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 twilight_ratio_raw = spectrum.get_static_sky_ratio_template()
NameError: name 'spectrum' is not defined
[9]:
full_science = spectra.stitch()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 full_science = spectra.stitch()
NameError: name 'spectra' is not defined
Let’s resample the sky spectrum to the exact wavelength coordinates as the science spectrum.
[10]:
resampled_sky_spectrum = resampler(full_science.sky, full_science.wavelength)
resampled_sky_spectrum = HPFSpectrum(spectral_axis=resampled_sky_spectrum.wavelength,
flux=resampled_sky_spectrum.flux,
uncertainty=resampled_sky_spectrum.uncertainty)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 resampled_sky_spectrum = resampler(full_science.sky, full_science.wavelength)
2 resampled_sky_spectrum = HPFSpectrum(spectral_axis=resampled_sky_spectrum.wavelength,
3 flux=resampled_sky_spectrum.flux,
4 uncertainty=resampled_sky_spectrum.uncertainty)
NameError: name 'full_science' is not defined
[11]:
full_sky = resampled_sky_spectrum.flux
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 full_sky = resampled_sky_spectrum.flux
NameError: name 'resampled_sky_spectrum' is not defined
[12]:
full_snr = resampled_sky_spectrum.snr
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 full_snr = resampled_sky_spectrum.snr
NameError: name 'resampled_sky_spectrum' is not defined
Let’s just pick the very high SNR pixels.
[13]:
full_mask = full_snr.value > 40
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 1
----> 1 full_mask = full_snr.value > 40
NameError: name 'full_snr' is not defined
[14]:
ax = resampled_sky_spectrum.plot(yhi=110);
(resampled_sky_spectrum*full_mask).plot(ax=ax);
ax.set_ylabel('Signal-to-noise ratio (per pixel)');
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 ax = resampled_sky_spectrum.plot(yhi=110);
2 (resampled_sky_spectrum*full_mask).plot(ax=ax);
3 ax.set_ylabel('Signal-to-noise ratio (per pixel)');
NameError: name 'resampled_sky_spectrum' is not defined
Looks good, we are focusing on the highest-SNR emission lines.
[15]:
full_ratio = full_science.divide(resampled_sky_spectrum, handle_meta='ff').multiply(full_mask*full_science.flux.unit)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 full_ratio = full_science.divide(resampled_sky_spectrum, handle_meta='ff').multiply(full_mask*full_science.flux.unit)
NameError: name 'full_science' is not defined
[16]:
ax = full_ratio.plot(ylo=0.8, yhi=1.1, linestyle='none', marker='o', ms=3, alpha=0.8, label='Blank Night Sky')
twilight_ratio_raw.plot(ax=ax, linestyle='none', marker='.', ms=1, label='Twilight Flats');
ax.set_title('HPF Science Fiber to Sky Fiber Ratio')
ax.axhline(1.0, linestyle='dotted', color='k')
ax.set_ylabel('Ratio')
ax.set_ylim(0.8, 1.05)
ax.legend();
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 1
----> 1 ax = full_ratio.plot(ylo=0.8, yhi=1.1, linestyle='none', marker='o', ms=3, alpha=0.8, label='Blank Night Sky')
2 twilight_ratio_raw.plot(ax=ax, linestyle='none', marker='.', ms=1, label='Twilight Flats');
4 ax.set_title('HPF Science Fiber to Sky Fiber Ratio')
NameError: name 'full_ratio' is not defined
Nice, we get about the same answer and trend using these data and the twilight flat data. Se see a high amount of scatter, but comparable to the scatter seen in the twilight flats themeselves (not shown). Overall the scatter must come in part from random noise, and possible some weak trends from genuine physics– truly different atmospheric temperatures, airmasses, pressures, wind shears, etc.
Deep dive: Do sky-free pixels show any other artifacts?¶
The background flux in most pixels is so close to zero that the denominator in the ratio has many near-zero values, sending the ratio spectrum to near-infinity. Here we compute the difference spectrum—scaled by the twilight flat vector—and divided by the Goldilocks-reported uncertainty vector. Most of the nominally empty pixels in the difference vector constructed in this way should have mean value of zero with a propagated \(\sigma\) of \(\sqrt{1^2 + 0.93^2}\).
[17]:
from scipy.stats import norm
import celerite2
from celerite2 import terms
[18]:
full_twilight_ratio = resampler(twilight_ratio_raw, full_science.spectral_axis)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[18], line 1
----> 1 full_twilight_ratio = resampler(twilight_ratio_raw, full_science.spectral_axis)
NameError: name 'twilight_ratio_raw' is not defined
Compute the residual, and normalize by the uncertainty.
\(\mathbf{r} = \mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}}\)
[19]:
residual = (full_science - full_twilight_ratio*full_sky)
sigmas = np.hstack([spec.uncertainty.array for spec in spectra])
residual_in_sigma = residual.flux.value/(np.hypot(1.0, 0.93)*sigmas)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 1
----> 1 residual = (full_science - full_twilight_ratio*full_sky)
2 sigmas = np.hstack([spec.uncertainty.array for spec in spectra])
3 residual_in_sigma = residual.flux.value/(np.hypot(1.0, 0.93)*sigmas)
NameError: name 'full_science' is not defined
Smooth the residual spectrum with a Gaussian Process.
[20]:
mask = np.isfinite(residual_in_sigma)
x = residual.wavelength.value[mask]
y = residual_in_sigma[mask]
N_pix = len(x)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 1
----> 1 mask = np.isfinite(residual_in_sigma)
2 x = residual.wavelength.value[mask]
3 y = residual_in_sigma[mask]
NameError: name 'residual_in_sigma' is not defined
[21]:
kernel = terms.SHOTerm(sigma=0.8, rho=200.0, Q=0.25)
gp = celerite2.GaussianProcess(kernel, mean=0.1)
gp.compute(x, yerr=np.ones(N_pix))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 4
1 kernel = terms.SHOTerm(sigma=0.8, rho=200.0, Q=0.25)
3 gp = celerite2.GaussianProcess(kernel, mean=0.1)
----> 4 gp.compute(x, yerr=np.ones(N_pix))
NameError: name 'x' is not defined
[22]:
plt.figure(figsize=(15, 5))
plt.plot(x, y, linestyle='none', marker='o', alpha=0.1, ms=3, color='#2980b9', label=None)
plt.plot(x, y*0-999, linestyle='none', marker='o', alpha=1, ms=3, color='#2980b9',
label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$')
plt.plot(x, gp.predict(y, x), linestyle='none', color='#e67e22', marker='.', ms=2, label=None)
plt.plot(x, y*0-999, color='#e67e22', label='Systematic bias')
#plt.plot(x, gp.predict(y_noise, x), linestyle='none', color='#95a5a6', marker='.', ms=2, label='IID')
plt.scatter(residual.wavelength.value[full_mask], residual_in_sigma[full_mask],
marker='o', s=29, ec='k', c='#f1c40f', zorder=10, label='High SNR Emission Line Pixels')
plt.axhline(0.0, color='k', linestyle='dashed')
plt.ylim(-5, 5)
plt.legend(ncol=3)
plt.xlabel('$\lambda \; (\AA)$')
plt.ylabel('Residual $(\sigma)$')
#plt.savefig('../../paper/data_analysis_I/figures/Scaled_sky_residual_trend.png', dpi=300, bbox_inches='tight')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[22], line 1
----> 1 plt.figure(figsize=(15, 5))
2 plt.plot(x, y, linestyle='none', marker='o', alpha=0.1, ms=3, color='#2980b9', label=None)
3 plt.plot(x, y*0-999, linestyle='none', marker='o', alpha=1, ms=3, color='#2980b9',
4 label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$')
NameError: name 'plt' is not defined
We see minor systematic trends with an amplitude of about <0.5 sigma, large enough to bloat the realized RMS from its expected value of \(\sqrt{1^2 + 0.93^2}\).
We can make a histogram to quantify the bloating scale.
[23]:
bins = np.arange(-15, 15, 0.2)
standard_normal = norm(loc=0, scale=1).pdf(bins)
refined_uncertainty = norm(loc=0.1, scale=1.3).pdf(bins)
[24]:
plt.hist(residual_in_sigma, bins=bins, density=True, label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$');
plt.plot(bins, standard_normal, lw=3,
label='Standard normal (Expected with accurate uncertainties)');
plt.plot(bins, refined_uncertainty, lw=2, color='k',
label='Observed: $\hat\sigma = 1.30\sigma_{\mathrm{G}} \quad \hat\mu = 0.1\sigma_{\mathrm{G}}$');
plt.xlabel('Residual ($\sigma_{\mathrm{G}}$)')
plt.hist(gp.predict(y, x), bins=bins, density=True, label='Systematic Bias');
plt.yscale('log')
plt.ylim(1e-4, 1e3)
plt.xlim(-6, 6)
plt.legend(loc='best')
#plt.savefig('../../paper/data_analysis_I/figures/Scaled_sky_residual_uncertainty.png', dpi=300, bbox_inches='tight')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[24], line 1
----> 1 plt.hist(residual_in_sigma, bins=bins, density=True, label=r'Residual ($\mathbf{f}_{\star} - \mathbf{\beta} \; \odot \mathbf{f}_{\mathrm{sky}})$');
2 plt.plot(bins, standard_normal, lw=3,
3 label='Standard normal (Expected with accurate uncertainties)');
4 plt.plot(bins, refined_uncertainty, lw=2, color='k',
5 label='Observed: $\hat\sigma = 1.30\sigma_{\mathrm{G}} \quad \hat\mu = 0.1\sigma_{\mathrm{G}}$');
NameError: name 'plt' is not defined
We see find a realized uncertainty of \(1.3\times\) the expected value, in other words Goldilocks underestimates the uncertainty by 1.3x, attributable to this systematic bias trend. I get the same trend if I use a scalar (\(\bar \beta=0.93\)) prefactor to the sky fiber, so I know I am not injecting this signal artificially from the twilight flats. All things considered, this systematic bias is pretty tiny and immaterial for most scientific applications.