Perik.ai See who’s hiring. Apply before everyone else.
← Back to all jobs

Multimodal AI Systems Architect (AI Engineering)

Hyphenconnect
📍 Boston, USA 📅 Posted April 24, 2026
Apply on Hyphenconnect’s website →

About this role

We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.

Responsibilities:

• Integrate vision encoders and audio-native models into core agent reasoning loops.

• Optimize streaming latency for voice-to-voice AI interactions.

• Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.

Qualifications:

• Experience with Whisper, CLIP, and multimodal LLM integration.

• Knowledge of streaming architectures and WebRTC.

• Expertise in cross-modal alignment.

This listing was aggregated by Perik.ai from Hyphenconnect’s public job board. Click the button above to view the full job description and apply directly.
Explore more jobs
More from Hyphenconnect Browse all AI & tech jobs

Perik.ai is an AI & tech job board that aggregates the latest openings from top companies — updated daily so you can apply before everyone else.

About FAQ Privacy Policy Terms of Service Contact