So, I remembered having seen that on a very bad copy of the TS1500 (the official US ZX81 clone) schematic that it's cassette input also used a transistor, so I looked it up. The schematic was very nearly unreadable, but I was still able to get the idea it used, even though the values of capacitors and resistors it used were nearly unreadable, at least not with 100% certainty, I could guess many of them, but not the the values of C1 and R2.
It did remind me of the initial transistor schematic I had, but with some important changes, specifically the pullup resistor, that generates the base bias, and the 220 Ohm emitter resistor.
So I made an LTSPICE simulation schematic, and discovered that with the right components it really did work well.
This is the setup with the optimized values I came up with:

and this is a printout of the simulated waveforms:

Looks good doesn't it? The green signal is the input signal, and the blue-grey signal is the generated output signal.
I did some more testing after I made these pictures, and discovered a few more things:
I propose to increase C1 to 10uF, as after a long period of bursts otherwise the output does not transfer immediately to 0V, but stays for time time in the dangerous zone above 1V. This is especially true for capacitance's of say 100nF. Less essential, but still better would be to add a resistor of about 1K over the input to GND, it also has an influence on the behavior of the output going to GND after a burst of pulses.
As you can see the
negative going pulse drives the output high, not the positive one, but that does not matter, like you said it only causes a 150µS delay. This design works well with inputs of about 1V peak/peak. It also works with smaller signals, but then (for a sinusoidal signal) the generated OUTPUT pulse wide changes, somewhat. Higher input signals are better. R5 helps to get high pulses on the output that have an almost 50% duty cycle, so its definitely an improvement over leaving it out. The correct value of R2 is essential for the correct operation of the circuit, as it sets the bias voltage right, changing it too much influences the circuit, and means it either does not work at all, or operates only with larger input signals.
I think that with this input circuit the ZX81 cassette input will improve dramatically, and work with HCT logic. It also protects against ESD impulses.
The only further improvement I foresee is a very small capacitor over the 1K input termination resistor, which would filter off unwanted HF signals which could cause spurious pulses. But I must see if I can find space in my design to add one.