Design & Build Data Infrastructure: Architect and implement scalable, high-performance data infrastructure focusing on event-driven architectures, real-time data streaming, and advanced AI-driven applications. Event-Driven Data Solutions: Develop event-driven systems leveraging tools like Apache Kafka or similar technologies to support real-time data processing and low-latency pipelines. Hands-on Development: Actively develop and maintain data pipelines, ETL/ELT processes, and event-streaming solutions using Apache Kafka, Apache Flink, Apache Spark, or similar tools, as well as AI-specific data systems. Database Management: Manage and optimize SQL, NoSQL, OLAP and vector databases to ensure high availability, scalability, and performance, leveraging deep knowledge of database internals, mastery of concepts such as partitioning, sharding, embeddings, distributed database systems, and change data capture (CDC) techniques to drive efficiency and reliability across complex, large-scale environments. Data Integration: Build real-time and batch data pipelines that integrate structured and unstructured data from various sources, including AI models and third-party data sources. Performance Tuning: Continuously monitor and optimize data systems for performance, ensuring that AI workloads are supported by highly efficient data pipelines and storage solutions. Collaboration: Work closely with product managers, software engineers, and data scientists to align event-driven architectures, vector databases, and data pipelines with the needs of AI and machine learning models. Cloud Architecture: Architect and manage cloud-based data solutions (AWS preferred, GCP, or Azure) that support distributed data processing, AI workloads, and real-time data streaming. Vector Databases: Design and implement vector-based databases (e.g., Pinecone, pg_vector, Milvus) to support machine learning models, including Generative AI applications, efficiently handling high-dimensional data such as embeddings and unstructured data.
Minimum Qualifications: Education: Bachelor – Computer Science, Civil Engineering, Electrical Engineering or a related field of study; will accept equivalent foreign degree*;
Experience: Two (2) years in the position above, as a Data Engineer or in a related role;
Experience must include:
1. Implementing complex ETL pipelines;
2. MongoDB and Kafka;
3. Salesforce, Stripe, AWS S3 and GCP; and
4. Python, Redshift, Stripe, Athena and BigQuery.
*Will accept no degree and four (4) years of related experience in lieu of a Bachelor’s degree and two (2) years of experience; will also accept any fully equivalent combination of education, training, and/or experience.
Location: 8800 Lyra Dr., Ste. 200, Columbus, Ohio 43240. Remote position, work can be performed anywhere in the U.S. Domestic travel within Ohio required up to 30% of the time for project meetings.
Send resume to and reference job code LC25-101:
Kenya Dreher
Senior People Business Partner
Arcos, LLC
8800 Lyra Dr. Ste. 200
Columbus, Ohio 43240