AI Data Engineer
Bright Vision Technologies
About This Role
Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications.
As we continue to grow, we re looking for a skilled AI Data Engineer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.
AI Data Engineer
Job Title: AI Data Engineer
Location: 100% Remote (Continental United States)
Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor)
Experience: 6+ years
Salary Range : $100k to $150k per annnum
Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates.
Employment Type: Full-time, direct W2 with Bright Vision Technologies (no C2C, no 1099, no third-party)
Engagement: Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap
Compensation: Competitive base salary commensurate with experience, plus benefits.
Employment Terms & Visa Policy
This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies.
This role is part of Bright Vision Technologies in-house Statement of Work (SOW) engagement. The client, end customer, and employer for this position is Bright Vision Technologies - there is no third-party client, vendor, or implementation partner involved.
We do not engage in C2C, 1099, or third-party arrangements for this role.
BUT STRICTLY NO C2C/1099/3RD PARTY COMPANIES. ALL OUR ROLES ARE W2 AND NO 3RD PARTY BROKERING PLEASE.
Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies and contribute to our in-house SOW deliverables.
No new H1B sponsorship is available for this role.
However, candidates who are currently on a valid H1B visa and require a transfer are welcome to apply. We will support H1B transfers for qualified candidates.
For every role, a technical coding assessment is mandatory. Please apply only if you are confident in your technical abilities and hands-on experience.
Job Summary
We are seeking an AI Data Engineer to build and operate the large-scale data systems that power modern AI training and evaluation pipelines. The role combines deep data engineering expertise with a strong understanding of AI workloads, focusing on ingestion, transformation, quality assurance, lineage, and high-throughput delivery of data to training jobs across diverse modalities. The ideal candidate has experience operating petabyte-scale data systems, strong software engineering fundamentals, and clear understanding of how data infrastructure choices propagate into model quality and training efficiency.
Key Responsibilities
• Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows.
• Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals.
• Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale.
• Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training.
• Build high-throughput data loading systems that maximize GPU utilization during training.
• Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems.
• Design storage architectures balancing cost, throughput, and latency across data tiers.
• Build evaluation dataset construction pipelines with strict integrity and contamination controls.
• Implement data privacy, redaction, and consent enforcement throughout the pipeline.
• Collaborate with ML researchers and engineers to align data systems with model development needs.
• Drive observability of data quality, drift, and pipeline health across the AI data estate.
• Optimize cost and performance through compression, format selection, and caching strategies.
• Document data systems, schemas, and operational procedures for broad internal use.
• Stay current with AI data infrastructure research and emerging open-source tools.
Required Qualifications
• Bachelor s or Master s degree in Computer Science or a related field.
• Six or more years of data engineering experience, with significant work supporting ML or AI workloads.
• Strong proficiency in Python and at least one JVM or systems language.
• Deep experience with modern data processing frameworks such as Spark, Ray, or Beam.
• Hands-on experience operating petabyte-scale storage and pipeline systems.
• Strong understanding of distributed systems, data modeling, and storage formats.
• Experience with dataset versioning, lineage, and reproducibility for ML workflows.
• Familiarity with high-throughput data loading for accelerator-based training.
...
Ready to Apply?
Click the button below to visit the company's application page.
Apply for this Position