AI Data Engineer

Bright Vision Technologies

Apply Now
United States
$100,000 - $150,000 / year
full-time
senior
Posted July 4, 2026
via himalayas

About This Role

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we re looking for a skilled AI Data Engineer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential. AI Data Engineer Job Title: AI Data Engineer Location: 100% Remote (Continental United States) Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor) Experience: 6+ years Salary Range : $100k to $150k per annnum Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates. Employment Type: Full-time, direct W2 with Bright Vision Technologies (no C2C, no 1099, no third-party) Engagement: Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap Compensation: Competitive base salary commensurate with experience, plus benefits. Employment Terms & Visa Policy This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies. This role is part of Bright Vision Technologies in-house Statement of Work (SOW) engagement. The client, end customer, and employer for this position is Bright Vision Technologies - there is no third-party client, vendor, or implementation partner involved. We do not engage in C2C, 1099, or third-party arrangements for this role. BUT STRICTLY NO C2C/1099/3RD PARTY COMPANIES. ALL OUR ROLES ARE W2 AND NO 3RD PARTY BROKERING PLEASE. Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies and contribute to our in-house SOW deliverables. No new H1B sponsorship is available for this role. However, candidates who are currently on a valid H1B visa and require a transfer are welcome to apply. We will support H1B transfers for qualified candidates. For every role, a technical coding assessment is mandatory. Please apply only if you are confident in your technical abilities and hands-on experience. Job Summary We are seeking an AI Data Engineer to build and operate the large-scale data systems that power modern AI training and evaluation pipelines. The role combines deep data engineering expertise with a strong understanding of AI workloads, focusing on ingestion, transformation, quality assurance, lineage, and high-throughput delivery of data to training jobs across diverse modalities. The ideal candidate has experience operating petabyte-scale data systems, strong software engineering fundamentals, and clear understanding of how data infrastructure choices propagate into model quality and training efficiency. Key Responsibilities • Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows. • Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals. • Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale. • Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training. • Build high-throughput data loading systems that maximize GPU utilization during training. • Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems. • Design storage architectures balancing cost, throughput, and latency across data tiers. • Build evaluation dataset construction pipelines with strict integrity and contamination controls. • Implement data privacy, redaction, and consent enforcement throughout the pipeline. • Collaborate with ML researchers and engineers to align data systems with model development needs. • Drive observability of data quality, drift, and pipeline health across the AI data estate. • Optimize cost and performance through compression, format selection, and caching strategies. • Document data systems, schemas, and operational procedures for broad internal use. • Stay current with AI data infrastructure research and emerging open-source tools. Required Qualifications • Bachelor s or Master s degree in Computer Science or a related field. • Six or more years of data engineering experience, with significant work supporting ML or AI workloads. • Strong proficiency in Python and at least one JVM or systems language. • Deep experience with modern data processing frameworks such as Spark, Ray, or Beam. • Hands-on experience operating petabyte-scale storage and pipeline systems. • Strong understanding of distributed systems, data modeling, and storage formats. • Experience with dataset versioning, lineage, and reproducibility for ML workflows. • Familiarity with high-throughput data loading for accelerator-based training. ...

Ready to Apply?

Click the button below to visit the company's application page.

Apply for this Position