X Talk

by Brian Shimanuki

Answer:
↗PIANO

Each segment of the video has a noisy image and noisy audio. The image and audio pair can be identified as sharing something in common which fits the enumeration in the corner of the image.

ImageAudioCommonality
Jazz ComboWombo ComboCOMBO
Rock subgenreInsurance jingle mashupPROGRESSIVE
Led Zeppelin IVIndependence Day themeFOURTH
Calculator functionMusical inversionsINVERSE
West Side StoryInside Out themeSIDE

Meanwhile, the noise in both the images and the audio is suspicious. The title and flavortext are suggestive of crosstalk.

Looking closer at the files in the zip file, we can see that each image is 420x420 grayscale, and each audio segment is 4 seconds at 44100Hz. Each of these has 176400 samples (and each sample is represented by a byte in both mediums).

We need to convert the images to audio and vice versa. To do this, we treat the images as a progressive scan. (If the 5 commonality words are obtained by this point, they vaguely hint at the converting to get the inverse sides, but the conversion step is attainable even before the identifying all the initial given data.)

By treating the 2D images as an audio signal scanning across each row in order, we get 5 new audio clips. (Here is an example script.) Likewise, by taking the 1D audio signals and shaping them into a 420x420 square, we get new images. These crosstalk image/audio pairs work in the same way and we can find a commonality between each pair.

The new images additionally each have a rebus clue with two blanks. We can fill in the blanks for the two commonalities associated with that pair and get a new word.

Original PairCrosstalked PairRebus ResultRebus Explanation
COMBOTURBINECOMBINECross COMBO and TURBINE at the B
PROGRESSIVECOUNTESSTWOCOUNT of ESS (the letter S) in PROGRESSIVE
FOURTHTIMEDIMENSIONTIME is the FOURTH DIMENSION
INVERSEPERIODFREQUENCYFREQUENCY is the INVERSE of PERIOD
SIDETHE DARK SIDE OF THE MOONTRANSFORMDARK OF THE MOON (without SIDE) is a TRANSFORMERS movie

We get an instruction: COMBINE TWO DIMENSION FREQUENCY TRANSFORM. This tells us to apply a fourier transform to the 2D medium, the images. This can be done with code or online: http://bigwww.epfl.ch/demo/ip/demos/FFT/ or https://ejectamenta.com/imaging-experiments/fourifier/.

The 2D Fourier transform of each of the crosstalked images is an image with a mostly black square in the corner with bits of white. If we combine all 5 images (by taking the brightest, the average, or the sum -- any should work), we get an image of an UPRIGHT PIANO. Because parts of answers in this subround turn into arrows, this becomes ↗PIANO.